# install the orchaRd package
install.packages("devtools")
install.packages("tidyverse")
# install.packages('metafor')# for section S5, one has to install the developing
# version of metafor using:
remotes::install_github("wviechtb/metafor")
install.packages("patchwork")
install.packages("R.rsp")
devtools::install_github("itchyshin/orchard_plot", subdir = "orchaRd", force = TRUE,
build_vignettes = TRUE)
# install the dmetar package
if (!require("devtools")) {
install.packages("devtools")
}
devtools::install_github("MathiasHarrer/dmetar")
# getting metaAidR
install.packages("devtools")
library(devtools)
install_github("daniel1noble/metaAidR")
Ask Rose to finish this off - references are not done etc - need fixing - use (ask Yefeng and Alfredo do help as well) – e.g. (Viechtbauer, 2010)
The main objective of the literature survey was to capture the types of publication bias methods recently used in the field of ecology and evolution. The survey reported here was conducted alongside a larger survey on reporting standards from the PRISMA-EcoEvo project (Preferred Reporting Items for Systematic reviews and Meta-Analyses in Ecology and Evolutionary Biology; full details and materials provided in O’Dea et al. (2019). The survey was based on 102 meta-analysis papers published between 1 January 2010 and 25 March 2019. We aimed for the sample of 102 meta-analyses to be representative of recently published meta-analyses in ecology and evolutionary biology journals.
To select the representative sample of meta-analysis publications, we first searched the Scopus database for papers with (“meta-analy*” OR “metaanaly*” OR “meta-regression”) in the title, abstract, or keywords (where the asterisk allows for any ending to the word, such as “meta-analyses” or “meta-analysis”), restricted the date to papers published after 2009, and restricted the ISSN (International Standard Serial Number) to journals classified as ‘Ecology’ and/or ‘Evolutionary Biology’ by the 2017 rankings of the ISI InCites Journal Citation Reports. On 25 March 2019 this search returned 1,668 papers from 134 journals.
Second, the returned papers were categorised as being from journals listed as ‘Ecology,’ ‘Evolution,’ or ‘Both,’ and we reduced the number of journals and papers. To reduce the number of journals we arranged journals in descending order of frequency (i.e. to indicate which journals published the most meta-analyses). After removing journals in more applied sub-fields (such as ecology economics), we retained the top 10 journals classified as ecology, the top 10 journals classified as evolutionary biology, and the top 11 journals classified as both ecology and evolutionary biology. The list of 31 retained journals is shown in Table S1. Next, we selected 297 studies to be screened for inclusion in the sample, by using the ‘sample’ function in R (v. 3.5.1; R Core Team, 2018) to randomly select up to 17 studies from the journals classified as ‘Evolutionary Biology’ (57 studies total) or ‘Both’ (150 studies total), and up to 9 studies from journals classified as ‘Ecology’ (90 studies total).
Journals screened in our search for a representative sample of meta-analyses published in ecology and evolutionary biology. ISI Classification is based on the 2017 ISI InCites Journal Citation Reports. The number of papers returned from the Scopus database search is described by ‘N hits,’ while the number of studies selected for screening is described by ‘N screened.’
# getting the data and formatting some variables (turning character vectors to
# factors)
read_excel(here("data/TableS1.xlsx"), na = "NA") %>% # mutate_if(is.character, as.factor) %>%
kbl() %>% kable_paper() %>% kable_styling("striped", position = "left")
| ISI Classification | Full Journal Title | N hits | N screened |
|---|---|---|---|
| Both | Proceedings of the Royal Society of London B: Biological Sciences | 47 | 17 |
| Both | Ecology and Evolution | 43 | 17 |
| Both | Molecular Ecology | 35 | 17 |
| Both | Journal of Evolutionary Biology | 32 | 17 |
| Both | Evolution | 25 | 17 |
| Both | American Naturalist | 20 | 17 |
| Both | Biology Letters | 17 | 17 |
| Both | Evolutionary Ecology | 14 | 14 |
| Both | Nature Ecology & Evolution | 7 | 7 |
| Both | Heredity | 5 | 5 |
| Both | Molecular Ecology Resources | 5 | 5 |
| Ecology | Global Change Biology | 135 | 9 |
| Ecology | Ecology Letters | 98 | 9 |
| Ecology | Oikos | 65 | 9 |
| Ecology | Global Ecology and Biogeography | 51 | 9 |
| Ecology | Biological Conservation | 49 | 9 |
| Ecology | Oecologia | 47 | 9 |
| Ecology | Conservation Biology | 43 | 9 |
| Ecology | Journal of Applied Ecology | 42 | 9 |
| Ecology | Journal of Ecology | 35 | 9 |
| Ecology | Marine Ecology Progress Series | 35 | 9 |
| Evolution | BMC Evolutionary Biology | 12 | 12 |
| Evolution | Evolutionary Applications | 10 | 10 |
| Evolution | Biological Journal of the Linnean Society | 9 | 9 |
| Evolution | Molecular Biology and Evolution | 7 | 7 |
| Evolution | Genome Biology and Evolution | 4 | 4 |
| Evolution | Molecular Phylogenetics and Evolution | 4 | 4 |
| Evolution | Evolutionary Bioinformatics | 3 | 3 |
| Evolution | Evolutionary Biology | 3 | 3 |
| Evolution | Journal of Heredity | 3 | 3 |
| Evolution | Systematic Biology | 2 | 2 |
The sample of 102 meta-analysis papers met the following four inclusion criteria: (1) the study addressed a question in the fields of ecology and evolutionary biology; (2) claimed to present results from a meta-analysis; (3) performed a search for, and collected, data from the primary literature; (4) used a statistical model to analyse effect sizes that were collected from multiple studies. Paper screening was conducted in two stages. First, two authors (RO and ML) conducted parallel abstract screening. Conflicting decisions were discussed and resolved. Second, 64% of screened abstracts underwent parallel full-text screening by RO and ML, in consultation with SN.
Papers were independently assessed by seven authors (RO, DN, JK, MJ, ML, RO, SN, and TP) as part of a larger, time-consuming survey. The one survey question pertaining to this project was: “Which publication bias tests are reported in the paper? (Select all that apply).” There were 11 possible choices for respondents to select: (A) Funnel plots (including contour-enhanced funnel plots); (B) Normal quantile (QQ) plots (Wang & Bushman); (C) Correlation-based tests (e.g. Begg & Manzumdar rank correlation); (D) Regression-based tests (e.g. Egger regression and its variants); (E) File drawer numbers or fail-safe N (Rosenthal, Orwin or Rosenberg method); (F) Trim-and-fill tests; (G) P-curve, P-uniform or its variants; (H) Selection (method) models (e.g. Copas, Hedges or lyengar & Greenhouse model); (I) Time-lag bias tests (e.g., regression or correlation on the relationship between effect sizes and time or cumulative meta-analysis); (J) None reported and (K) ‘Other’ methods. According to Sutton (2009) (see also Vevea et al., 2019), Methods A-D are tests detecting publication bias, whereas Methods E-F are assessing the impact of publication bias.
Among the 102 assessed papers, 17.8% did not report any tests of publication bias. Most meta-analysis papers reported one or more tests of publication bias (17.8% of assessed papers did not include any assessment of publication bias). These results suggest tests of publication bias have become more common in recent year in ecology and evolutionary biology, as over half of older meta-analyses assessed by Nakagawa and Santos (2012) and Koricheva and Gurevitch (2014) did not report any tests of publication bias (although our results are not directly comparable, due to different survey methods). Still, inferential tests of publication bias remain uncommon. By far the most popular test of publication bias were funnel plots (32.4%; Table S2), with all remaining methods represented by fewer than 15% of papers. All methods except ‘selection models’ were present in at least one paper (with ‘other’ being selected for a weighted histogram used by Loydi et al. (2013; Table S2). The absence of selection model methods could be because these methods are comparatively technically challenging, or because ecologists and evolutionary biologists are not yet aware of the benefits of these methods.
Frequency with which publication bias tests were reported in the 102 meta-analysis publications, ranked in order of decreasing popularity. No tests were reported for 17.80% of papers.
# getting the data and formatting some variables (turning character vectors to
# factors)
read_excel(here("data/TableS2.xlsx"), na = "NA") %>% # mutate_if(is.character, as.factor) %>%
kbl() %>% kable_paper() %>% kable_styling("striped", position = "left")
| Publication Bias Test | Number of Papers Reporting Test | Percentage of Papers Reporting Test |
|---|---|---|
|
69 | 0.324 |
|
30 | 0.141 |
|
25 | 0.117 |
|
20 | 0.094 |
|
16 | 0.075 |
|
10 | 0.047 |
|
3 | 0.014 |
|
1 | 0.005 |
|
1 | 0.005 |
|
0 | 0.000 |
As in the main text, Equation 8 is as follows \[ z_{i} = \beta_{0} + \beta_{1}prec_i + e_i,\\ \] where \(e_{i} \sim \mathcal{N}(0, \sigma_e^2)\). We can rewrite this as:
\[ y_{i}/se_i = \beta_{0} + \beta_{1}(1/se_i) + e_i,\\ \] If we multiple both side by \(se_i\), we have:
\[ y_{i} = \beta_{0}se_i + \beta_{1} + e_{i}se_{i}, \]
which is basically the same, if \(\beta_0\) and \(\beta_1\) are swapped, as:
\[ y_{i} = \beta_{0} + \beta_{1}se_i + \epsilon_i, \] where \(\epsilon_i \sim \mathcal{N}(0, v_i\phi)\) as in Equation 9. We can show this using a simulated data as well (we use the data used for Figure 3 and 4). Below the first result is from Equation 8 and the second Equation 9:
# creating data - the same data as Figures 3 + 4
set.seed(77777)
# setting parameters
n.effect <- 100
sigma2.s <- 0.05
beta1 <- 0.2
# using negative binomial we get good spread of sample size
n.sample <- rnbinom(n.effect, mu = 30, size = 0.7) + 4
# variance for Zr
vi <- 1/(n.sample - 3)
# moderator x 1
xi <- rnorm(n.effect)
# there is underling overall effect to r = 0.2 or Zr = 0.203
Zr <- atanh(0.2) + beta1 * xi + rnorm(n.effect, 0, sqrt(sigma2.s)) + rnorm(n.effect,
0, sqrt(vi))
# qplot(Zr, 1/sqrt(vi))
# data frame
dat <- data.frame(yi = Zr, vi = vi, sei = sqrt(vi), xi = xi, ni = n.sample, prec = 1/sqrt(vi),
wi = 1/vi, zi = Zr/sqrt(vi))
rows <- 1:nrow(dat)
expected <- which(1/dat$sei < 5 & dat$yi < 0.25)
unexpected <- which(1/dat$sei > 4.7 & dat$yi > 0.25)
col_point <- ifelse(rows %in% expected, "red", ifelse(rows %in% unexpected, "blue",
"black"))
dat$col_point <- col_point
# data with 'publication bias'
dat2 <- dat[dat$col_point != "red", ]
# they are the same
eq_8 <- lm(zi ~ prec, data = dat2)
eq_9 <- lm(yi ~ sei, weight = wi, data = dat2)
summary(eq_8)
##
## Call:
## lm(formula = zi ~ prec, data = dat2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.389 -1.011 -0.027 0.969 5.082
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6657 0.4465 3.73 0.00038 ***
## prec -0.0382 0.0741 -0.52 0.60743
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.73 on 73 degrees of freedom
## Multiple R-squared: 0.00363, Adjusted R-squared: -0.01
## F-statistic: 0.266 on 1 and 73 DF, p-value: 0.607
summary(eq_9)
##
## Call:
## lm(formula = yi ~ sei, data = dat2, weights = wi)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -5.389 -1.011 -0.027 0.969 5.082
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0382 0.0741 -0.52 0.60743
## sei 1.6657 0.4465 3.73 0.00038 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.73 on 73 degrees of freedom
## Multiple R-squared: 0.16, Adjusted R-squared: 0.149
## F-statistic: 13.9 on 1 and 73 DF, p-value: 0.000375
As you can see above, Equation 8’s intercept is identical the slope of Equation 9 whiile Equation 8’s slope to Equation 9’s intercept. Note that corresponding standard errors (SE) are also the same.
We use the function fsn() in the R package, metafor, as in this web page (link).
# an example data using risk ratio (not response ratio)
dat <- escalc(measure = "RR", ai = tpos, bi = tneg, ci = cpos, di = cneg, data = dat.bcg)
fsn(yi, vi, data = dat, type = "Rosenthal")
##
## Fail-safe N Calculation Using the Rosenthal Approach
##
## Observed Significance Level: <.0001
## Target Significance Level: 0.05
##
## Fail-safe N: 598
# setting effect size targe as -0.1
fsn(yi, vi, data = dat, type = "Orwin", target = 0.1)
##
## Fail-safe N Calculation Using the Orwin Approach
##
## Average Effect Size: -0.7407
## Target Effect Size: -0.1000
##
## Fail-safe N: 84
fsn(yi, vi, data = dat, type = "Rosenberg")
##
## Fail-safe N Calculation Using the Rosenberg Approach
##
## Average Effect Size: -0.4303
## Observed Significance Level: <.0001
## Target Significance Level: 0.05
##
## Fail-safe N: 370
In this section, we provide worked examples to show how to test for publication bias in meta-analytic datasets with high heterogeneity and several layers of non-independence, including phylogenetic non-independence, which are common features of meta-analytic datasets in ecology and evolution (Senior et al. 2016, Noble et al. 2017). We used two openly available meta-analytic datasets to show how to implement our suggested multilevel meta-regression method (see main text) on meta-analytic datasets with correlation effect sizes (Zr: section S4.1) and effect sizes based on mean differences between two groups (SMD, lnRR: section S4.2).
# importing dataset with correlation coefficients
dataset.r.original <- read.xlsx(here("data", "ft044.xlsx"), colNames = TRUE, sheet = 1)
# transforming r to Zr for the analyses, and estimating VZr for the weighting
dataset.r.original$Sample.size <- as.numeric(dataset.r.original$Sample.size)
# subset dataset to those with 4 or more individuals (this removes two rows, one
# without sample size and one with sample size = 3)
dataset.r.original <- dataset.r.original[dataset.r.original$Sample.size > 3 & !(is.na(dataset.r.original$Sample.size)),
]
dataset.r.original$yi <- atanh(dataset.r.original$Correlation)
dataset.r.original$vi <- 1/(as.numeric(dataset.r.original$Sample.size) - 3) #the effect based on 3 individuals won't be able to be analyzed because VZr is then infinite (see VZr function above to understand). Also, there is a missing sample size.
# renaming some variables to make it easier
dataset.r.original$study <- dataset.r.original$Reference
dataset.r.original <- dataset.r.original[, !names(dataset.r.original) %in% c("Reference")]
For our first worked example, we use the dataset provided by Garamszegi et al. (2012), who tested the behavioural syndrome hypothesis (i.e. that behaviours are correlated) across studies and species, which they found support for. Here, we reanalyse the data from Garamszegi et al. (2012) by first conducting a phylogenetic multilevel intercept-only meta-analytic model and then testing for evidence of publication bias following the approaches outlined in the main text.
Since multiple species (n = 65 species) are included in this dataset, we need to account for phylogenetic non-independence in our statistical models (Chamberlain et al. 2012, Cinar et al. 2020). For that, we first search for the the species names in the Open Tree Taxonomy (Rees and Cranston 2017) to confirm and correct species names, then build a phylogenetic tree for all the species included in the dataset (Figure S1) by retrieving their phylogenetic relationships from the Open Tree of Life (Hinchliff et al. 2015) using the R package rotl (Michonneau et al. 2016). We then estimate branch lengths following Grafen (1989) as implemented in the function ‘compute.brlen()’ of the R package ape (Paradis and Schliep 2019). Finally, we construct a phylogenetic relatedness correlation matrix that will be fitted as part of the random effect structure of our models (see below).
# fixing a typo in the original list of species names
dataset.r.original$Species <- ifelse(dataset.r.original$Species == "Acrochephalus schoenobaenus",
"Acrocephalus schoenobaenus", dataset.r.original$Species)
# updating a name that was not being well recognized by the Open Tree Taxonomy
dataset.r.original$Species <- ifelse(dataset.r.original$Species == "Eurotestudo boettgeri",
"Testudo boettgeri", dataset.r.original$Species)
# fixing another typo in the original list of species names
dataset.r.original$Species <- ifelse(dataset.r.original$Species == "Taenopygia guttata",
"Taeniopygia guttata", dataset.r.original$Species)
# # obtaining dataframe listing the Open Tree identifiers potentially matching
# our list of species (be aware that this will take a few minutes, and you can
# load the data below) taxa <- tnrs_match_names(names =
# unique(dataset.r.original$Species))
# # saving the taxonomic data created on the 18th of February 2021 to speed the
# process in the future and make the code fully reproducible if taxonomic changes
# are implemented in the future save(taxa,file =
# 'taxa_Open_Tree_of_Life_20210218.RData')
# loading the taxonomic data (taxa) created on the 18th of February 2021
load(here("data", "taxa_Open_Tree_of_Life_20210218.RData"))
# # check approximate matches taxa[taxa$approximate_match==TRUE,] # check
# synonyms matches taxa[taxa$is_synonym==TRUE,] # check number of matches
# taxa[taxa$number_matches>1,]
# # some further checks ott_id_tocheck <- taxa[taxa$number_matches != 1,'ott_id']
# for(i in 1:length(ott_id_tocheck)){ print(inspect(taxa, ott_id =
# ott_id_tocheck[i])) }
# all phylogenetic data seems in order now
# however, the ott_id for Parma unifasciata cannot be found when retrieving the
# phylogenetic relationships, so to trick this we are going to use the ott_id of
# another Parma species, Parma oligolepis (ott_id = 323186), which in this case
# it is fine because we only include two species belonging to the genus Parma,
# and therefore, the phylogenetic relationship will be the same for our purposes
# tnrs_match_names(names = 'Parma oligolepis') taxa[taxa$unique_name=='Parma
# unifasciata','ott_id'] <- 323186
# # retrieving phylogenetic relationships among taxa in the form of a trimmed
# sub-tree (you can simply load the tree at the bottom) tree <-
# tol_induced_subtree(ott_ids = taxa[['ott_id']], label_format = 'name')
# # we need to check for the existence of polytomies is.binary.tree(tree) # No
# polytomies, so we can proceed.
# to confirm that our tree covers all the species we wanted it to include, and
# make sure that the species names in our database match those in the tree, we
# use the following code
# tree$tip.label <- gsub('_',' ', tree$tip.label)
# intersect(as.character(tree$tip.label),
# as.character(dataset.r.original$Species))
# setdiff(as.character(dataset.r.original$Species), as.character(tree$tip.label))
# #listed in our database but not in the tree
# setdiff(as.character(tree$tip.label),as.character(dataset.r.original$Species))
# # listed in the tree but not in our database
# All but Pan and Parma oligolepis are the same species, the 'problem' is that
# synonyms have been used in the tree. We are going to leave all the names as in
# Open Tree of Life except Pan and Parma oligolepis, which we are going to
# substitute by Pan troglodytes and Parma unifasciata
# # we start by fixing the following names in the tree
# tree$tip.label[tree$tip.label=='Pan']<-'Pan troglodytes'
# tree$tip.label[tree$tip.label=='Parma oligolepis']<-'Parma unifasciata'
# setdiff(as.character(dataset.r.original$Species), as.character(tree$tip.label))
# #listed in our database but not in the tree
# setdiff(as.character(tree$tip.label),as.character(dataset.r.original$Species))
# # listed in the tree but not in our database
# changing the names in our database to follow those in the tree. We are creating
# a new Species.updated variable so that it is clear that this list of Species is
# an updated version compared to the original one
dataset.r.original$Species.updated <- dataset.r.original$Species
dataset.r.original$Species.updated <- ifelse(dataset.r.original$Species.updated ==
"Carduelis chloris", "Chloris chloris", dataset.r.original$Species.updated)
dataset.r.original$Species.updated <- ifelse(dataset.r.original$Species.updated ==
"Dendroica pensylvaniaca", "Setophaga pensylvanica", dataset.r.original$Species.updated)
dataset.r.original$Species.updated <- ifelse(dataset.r.original$Species.updated ==
"Sylvia melanocephala", "Curruca melanocephala", dataset.r.original$Species.updated)
# we are also updating the original Species variable since we have to fit both
# Species and Species.updated in our models (see below)
dataset.r.original$Species <- dataset.r.original$Species.updated
# setdiff(as.character(dataset.r.original$Species.updated),
# as.character(tree$tip.label)) #listed in our database but not in the tree
# setdiff(as.character(tree$tip.label),as.character(dataset.r.original$Species.updated))
# # listed in the tree but not in our database
# all in order
# we can now save the tree save(tree, file = 'tree_20211802.Rdata')
dataset.r <- dataset.r.original
# load the saved tree (tree)
load(here("data", "tree_20211802.Rdata"))
# compute branch lengths of tree
phylo_branch <- compute.brlen(tree, method = "Grafen", power = 1)
# # check tree is ultrametric is.ultrametric(phylo_branch) # TRUE
# matrix to be included in the models
phylo_cor <- vcv(phylo_branch, cor = T)
Phylogenetic tree of all species included in the meta-analytic dataset. Notice that some of the names shown in the tree correspond to the most updated synonyms according to the Open Tree Taxonomy (Rees and Cranston 2017) of those originally avialable in the dataset of Garamszegi et al. (2012).
# we can then plot the tree
plot(tree, type = "fan", cex = 0.65, label.offset = 0.05, no.margin = TRUE) #check: https://www.rdocumentation.org/packages/ape/versions/5.3/topics/plot.phylo
Before running any meta-analytic model, it is important to explore the meta-analytic dataset in search of potential outliers that could represent data extraction errors. To do so, we can draw funnel plots showing the standard error and the inverse of the standard error (i.e. precision) as the y-axes (Figure S4.2). From these funnel plots, we conclude that no clear outliers seem to be present in this meta-analytic dataset.
Funnel plots with SE (left-hand side) and precision (1/SE; right-hand side) as measures of uncertainty to explore the existence of outliers in the meta-analytic dataset.
par(mfrow = c(1, 2))
funnel(dataset.r$yi, dataset.r$vi, yaxis="sei",
#xlim = c(-3, 3),
xlab = "Effect size (Zr)")
funnel(dataset.r$yi, dataset.r$vi, yaxis="seinv",
#xlim = c(-3, 3),
xlab = "Effect size (Zr)")
Now that we have created a matrix with the phylogenetic relationships among species and have confirmed that no outliers seem to be present, we can then use the dataset from Garamszegi et al. (2012) to provide a worked example on how to test of publication bias using the multilevel meta-regression method suggested in the main text. First, let’s have a look at the detailed results of a multilevel phylogenetic meta-analytic (intercept-only) model (Table S4.1), including the corresponding funnel plot showing the overall effect or meta-analytic mean (Figure S4.3).
# creating a unit-level random effect to model residual variance in metafor
dataset.r$obsID <- 1:nrow(dataset.r)
# running multilevel intercept-only meta-analytic model
meta.analysis.model.r <- rma.mv(yi, vi,
mods=~1,
random=list(~ 1 | Species.updated, # phylo effect
~ 1 | Species, # non-phylo effect
~ 1 | study,
~ 1 | obsID),
R = list(Species.updated = phylo_cor), #phylogenetic relatedness
method="REML",
test="t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.r)
# print(meta.analysis.model.r, digits=3)
# extracting the mean, 95% confidence intervals and 95% prediction intervals
metaanalytic.mean.model.r <- predict(meta.analysis.model.r, digits=3)
# estimating I2 as measure of heterogeneity
I2.model.r <- i2_ml(meta.analysis.model.r, method = "wv")
# creating a data.frame with the meta-analytic mean and heterogeneity estimates
table.model.r <- data.frame(n=length(unique(dataset.r$study)),
k=nrow(dataset.r),
mean=round(metaanalytic.mean.model.r[[1]],2),
CI=paste0("[",round(metaanalytic.mean.model.r[[3]],2),",",round(metaanalytic.mean.model.r[[4]],2),"]"),
PI=paste0("[",round(metaanalytic.mean.model.r[[5]],2),",",round(metaanalytic.mean.model.r[[6]],2),"]"),
I2_obsID=round(I2.model.r[["I2_obsID"]]*100,1),
I2_paperID=round(I2.model.r[["I2_study"]]*100,1),
I2_nonphylo=round(I2.model.r[["I2_Species"]]*100,1),
I2_phylo=round(I2.model.r[["I2_Species.updated"]]*100,1),
I2_total=round(I2.model.r[["I2_total"]]*100,1)
)
# creating a fancy table using the R package 'gt'
table.model.r.gt <- table.model.r %>%
gt() %>%
cols_label(n=md("**n**"),
k=md("**k**"),
mean=md("**Meta-analytic mean**"),
CI=md("**95% CI**"),
PI=md("**95% PI**"),
I2_obsID=md("***I*<sup>2</sup><sub>residual</sub>\n(%)**"),
I2_paperID=md("***I*<sup>2</sup><sub>study</sub>\n(%)**"),
I2_nonphylo=md("***I*<sup>2</sup><sub>non-phylogeny</sub>\n(%)**"),
I2_phylo=md("***I*<sup>2</sup><sub>phylogeny</sub>\n(%)**"),
I2_total=md("***I*<sup>2</sup><sub>total</sub> \n(%)**"),
) %>%
cols_align(align = "center") %>%
tab_source_note(source_note = md("n = number of studies; k = number of effects; CI = confidence interval; PI = prediction interval; *I*<sup>2</sup> = heterogeneity")) #%>%
# tab_options(table.width=770)
table.model.r.gt
| n | k | Meta-analytic mean | 95% CI | 95% PI | I2residual (%) | I2study (%) | I2non-phylogeny (%) | I2phylogeny (%) | I2total (%) |
|---|---|---|---|---|---|---|---|---|---|
| 88 | 104 | 0.29 | [0.17,0.41] | [-0.25,0.83] | 15.9 | 29.5 | 20.9 | 8.1 | 74.3 |
| n = number of studies; k = number of effects; CI = confidence interval; PI = prediction interval; I2 = heterogeneity | |||||||||
Table S4.1. Results of the phylogenetic multilevel intercept-only meta-analysis testing the relationship between behaviours across species. Estimates are presented as standardized effect sizes using Fisher’s transformation (i.e. Zr).
Funnel plot using precision (1/SE) as a measure of uncertainty and showing the meta-analytic mean (vertical line) obtained in the phylogenetic multilevel intercept-only meta-analysis.
funnel(meta.analysis.model.r, yaxis = "seinv", xlab = "Effect size (Zr)")
To test for publication bias, we can first fit a phylogenetic multilevel meta-regression to explore whether there is some evidence of small-study effects in our meta-analytic data. To do so, we fit a uni-moderator phylogenetic multilevel meta-regression including the effect sizes’ standard errors (SE or sei) as the only moderator (see Equation 21 from the main text). This meta-regression will provide some information about the existence of small-study effects (i.e. asymmetry in the distribution of effect sizes), but the best evidence for small-study effects will come from the all-in publication bias test (multi-moderator meta-regression) described in section S4.1.4.3, since this multi-moderator meta-regression will test whether asymmetry exists after accouting for the heterogeneity explained by all the included moderators (more in the main text).
# creating a variable for the standard error of each effect size (i.e. the square root of the sampling variance, see Figure 1 from the main text)
dataset.r$sei <- sqrt(dataset.r$vi)
# Application of Equation 21 from the main text
publication.bias.model.r.se <- rma.mv(yi, vi,
mod =~1+sei,
random=list(~ 1 | Species.updated, # phylo effect
~ 1 | Species, # non-phylo effect
~ 1 | study,
~ 1 | obsID),
R = list(Species.updated = phylo_cor), #phylogenetic relatedness
method="REML",
test="t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.r)
# print(publication.bias.model.r.se,digits=3)
# extracting the mean and 95% confidence intervals
estimates.publication.bias.model.r.se <- estimates.CI(publication.bias.model.r.se)
According to this uni-moderator meta-regression, there is some indication of small-study effects since the slope of the moderator ‘SE’ is marginally statistically significant (slope = 0.75, 95% CI = [-0.01,1.51]; R2marginal = 13.4%), showing that effect sizes with larger SE (less uncertain effect sizes) tend to be larger. Nonetheless, as mentioned above, we will confirm the existence of small-study effects after accounting for some of the heterogeneity present in the data using the all-in publication bias test (multi-moderator meta-regression) described in section S4.1.4.3.
To test for time-lag bias, also called decline effects, we can first fit a phylogenetic multilevel meta-regression. To do so, we fit a uni-moderator phylogenetic multilevel meta-regression including the year of publication (mean-centred) as the only moderator (see Equation 23 from the main text). The estimated slope for year of publication will provide some evidence on whether effect sizes have become smaller over time since the first effect size was published; however, as above, the best evidence for such decline effects will come from the all-in publication bias test (multi-moderator meta-regression) described in section S4.1.4.3, since this multi-moderator meta-regression will test for the existence of decline effects after accouting for the heterogeneity explained by all the included moderators (more in the main text).
# creating a variable with the year of publication
dataset.r$year <- as.numeric(str_extract(dataset.r$study, "(\\d)+"))
# mean-centring year of publication to help with interpretation, particularly in S4.1.4.3.
dataset.r$year.c <- as.vector(scale(dataset.r$year, scale = F))
# Application of Equation 23 from the main manuscript
publication.bias.model.r.timelag <- rma.mv(yi, vi,
mods=~1+year.c,
random=list(~ 1 | Species.updated, # phylo effect
~ 1 | Species, # non-phylo effect
~ 1 | study,
~ 1 | obsID),
R = list(Species.updated = phylo_cor), #phylogenetic relatedness
method="REML",
test="t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.r)
# summary(publication.bias.model.r.timelag)
# extracting the mean and 95% confidence intervals
estimates.publication.bias.model.r.timelag <- estimates.CI(publication.bias.model.r.timelag)
According to this uni-moderator meta-regression, there is no indication of decline effects since the slope of the moderator ‘year of publication’ is essentially zero (slope = 0, 95% CI = [-0.01,0.01]; R2marginal = 0.3%). Nonetheless, as mentioned above, we need to confirm this pattern after accounting for some of the heterogeneity present in the data using the all-in publication bias test (multi-moderator meta-regression) described in section S4.1.4.3.
When heterogeneity exists, which will normally be the case, particularly so in ecology and evolution (Senior et al. 2016), it is best to combine Equations 21 and 23 with other moderators since those additional moderators will generally explain or be predicted to explain some of the heterogeneity. That is, this all-in publication bias test (multi-moderator meta-regression) would be the best test of small-study (publication bias) and decline effects (time-lag bias) in most meta-analytic datasets. For our data, we will run a multi-moderator phylogenetic multilevel meta-regression including the effect sizes’ standard errors, the year of publication (mean-centred) and a moderator originally included in Garamszegi et al. (2012), ‘captivity status.’ Although Garamszegi et al. (2012) included other moderators in their model (i.e. species, taxonomic class, spatial overlap, age, sex and season), we decided to focus only on those moderators that were available for the entire dataset (i.e. no NA’s, see Table 2 in Garamszegi et al. 2012), and we also excluded ‘species’ and ‘taxonomic class’ since those effects are modelled by our phylogenetic approach.
# Application of Equation 24 from the main manuscript
# Note that we are removing the intercept in this model (using '-1') just because the intercept is not of interest at this point and instead we prefer to see the slopes of the continues moderators and the estimates of all the levels of the categorical moderator 'CaptivityC'. Note that whether to remove or not the intercept does not change the conclusions of the small-study effects test or the decline effects test.
publication.bias.model.r.all.se <- rma.mv(yi, vi,
mods=~-1+ # -1 removes the intercept
sei+
year.c+
CaptivityC,
random=list(~ 1 | Species.updated, # phylo effect
~ 1 | Species, # non-phylo effect
~ 1 | study,
~ 1 | obsID),
R = list(Species.updated = phylo_cor), #phylogenetic relatedness
method="REML",
test="t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.r)
#summary(publication.bias.model.r.all.se)
# extracting the mean and 95% confidence intervals
estimates.publication.bias.model.r.all.se <- estimates.CI(publication.bias.model.r.all.se)
The all-in publication bias test confirms what we observed in the uni-moderator meta-regressions above (sections S4.1.4.1 and S4.1.4.2). First, the multi-moderator meta-regression shows a statistically significant slope for the moderator ‘SE’ (slope = 0.82, 95% CI = [0.03,1.61]; Figure S4.4), showing evidence of small-study effects. In other words, the largest effect sizes in the dataset tend to be those with the lowest precision (i.e. larger uncertainty; Figure S4.4). This result agrees with the findings originally reported in Garamszegi et al. (2012), where the authors found evidence of publication bias after applying a simple correlation method proposed by Begg and Mazumdar (1994) and the trim-and-fill method (Duval and Tweedie 2000a,b) - although bear in mind that we do not recommend to use those simpler methods for meta-analytic datasets with multiple levels of non-independence and high heterogeneity (more in the main text; see alternative in section S5). Second, the all-in publication bias test also confirms that there is no evidence of decline effects in the data since the slope of the moderator ‘year of publication’ was again indistinguishable from zero (slope = 0.01, 95% CI = [-0.01,0.02]; Figure S4.5). Overall, the moderators included in this multi-moderator meta-regression explained a total of 16.5% (i.e. R2marginal) of the heterogeneity observed in the meta-analytic model (see Table S4.1).
# to plot the 'sei' effect we are simply including the moderator 'CaptivityC' as a random effect for simplicity of plotting. Note that this does not substantially change the estimates of the moderators, and that alternative approaches could be used for this.
publication.bias.model.r.all.se.plot <- rma.mv(yi, vi,
mods=~1+
sei+
year.c,
random=list(~1 | CaptivityC,
~ 1 | Species.updated, # phylo effect
~ 1 | Species, # non-phylo effect
~ 1 | study,
~ 1 | obsID),
R = list(Species.updated = phylo_cor), #phylogenetic relatedness
method="REML",
test="t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.r)
Effect sizes with larger standard error tend to be larger, providing evidence of small-study effects in the meta-analytic dataset. The solid line represents the model estimate and the shading shows its 95% confidence intervals.
# predicting effect sizes for a range of SE from the minimum observed in the data
# to the maximum observed in data, while year is kept at its mean value, in this
# case 0 since year was mean-centred
predict.publication.bias.model.r.all.se.plot.1 <- predict(publication.bias.model.r.all.se.plot,
newmods = cbind(seq(min(dataset.r$sei), max(dataset.r$sei), 0.005), c(0)))
newdat <- data.frame(sei = seq(min(dataset.r$sei), max(dataset.r$sei), 0.005), fit = predict.publication.bias.model.r.all.se.plot.1$pred,
upper = predict.publication.bias.model.r.all.se.plot.1$ci.ub, lower = predict.publication.bias.model.r.all.se.plot.1$ci.lb,
stringsAsFactors = FALSE)
xaxis <- dataset.r$sei
yaxis <- dataset.r$yi
plot(xaxis, yaxis, type = "n", ylab = "", xlab = "", xaxt = "n", yaxt = "n", ylim = c(-2.25,
2.25), xlim = c(0, 1))
abline(a = 0, b = 0, lwd = 1, lty = 1)
axis(1, at = seq(0, 1, 0.1), cex.axis = 0.8, tck = -0.02)
axis(2, at = round(seq(-2.5, 2.25, 0.5), 1), cex.axis = 0.8, las = 2, tck = -0.02)
title(xlab = "standard error (SE)", ylab = "effect size (Zr)", line = 2.75, cex.lab = 1.4)
points(xaxis, yaxis, bg = rgb(0, 0, 0, 0.1), col = rgb(0, 0, 0, 0.2), pch = 21, cex = 1)
lines(newdat$sei, newdat$fit, lwd = 2.75, col = "darkorchid4")
polygon(c(newdat$sei, rev(newdat$sei)), c(newdat$lower, rev(newdat$upper)), border = NA,
col = rgb(104/255, 34/255, 139/255, 0.5))
The overall effect size has remained seemingly unchanged over time since the first effect size was published. The solid line represents the model estimate, the shading shows its 95% confidence intervals and individual effect sizes are scaled by their precision (1/SE).
# predicting effect sizes for a range of years from the minimum observed in the
# data to the maximum observed in data, while SE is kept at its mean value. Keep
# in mind that alternative ways could be used, for example, one could use
# median-centring rather than mean-centring.
predict.publication.bias.model.r.all.se.plot.2 <- predict(publication.bias.model.r.all.se.plot,
newmods = cbind(mean(dataset.r$sei), seq(min(dataset.r$year.c), max(dataset.r$year.c),
0.25)))
newdat <- data.frame(year = seq(min(dataset.r$year), max(dataset.r$year), 0.25),
fit = predict.publication.bias.model.r.all.se.plot.2$pred, upper = predict.publication.bias.model.r.all.se.plot.2$ci.ub,
lower = predict.publication.bias.model.r.all.se.plot.2$ci.lb, stringsAsFactors = FALSE)
xaxis <- dataset.r$year
yaxis <- dataset.r$yi
cex.study <- (1/dataset.r$sei)/4
plot(xaxis, yaxis, type = "n", ylab = "", xlab = "", xaxt = "n", yaxt = "n", ylim = c(-2.25,
2.25), xlim = c(min(dataset.r$year), max(dataset.r$year)))
abline(a = 0, b = 0, lwd = 1, lty = 1)
axis(1, at = seq(min(dataset.r$year), max(dataset.r$year), 5), cex.axis = 0.8, tck = -0.02)
axis(2, at = round(seq(-2.5, 2.25, 0.5), 1), cex.axis = 0.8, las = 2, tck = -0.02)
title(xlab = "year of publication", ylab = "effect size (Zr)", line = 2.75, cex.lab = 1.4)
points(xaxis, yaxis, bg = rgb(0, 0, 0, 0.1), col = rgb(0, 0, 0, 0.2), pch = 21, cex = cex.study)
lines(newdat$year, newdat$fit, lwd = 2.75, col = "darkorchid4")
polygon(c(newdat$year, rev(newdat$year)), c(newdat$lower, rev(newdat$upper)), border = NA,
col = rgb(104/255, 34/255, 139/255, 0.5))
Once we have confirmed that there is evidence of small-study effects (see above), the next step in our suggested two-step approach is to calculate the adjusted overall effect size, that is, the overall effect once we account for publication bias (see the main text). Bear in mind that, if the slope for the moderator ‘SE’ in the all-in publication bias test shown above was not statistically significant, then the best estimate for the adjusted overall effect size would be the intercept of that all-in publication bias test (i.e. the one with SE; Stanley and Doucouliagos 2012, 2014; more in the main text). Since the slope for the moderator ‘SE’ was statistically significant in this dataset, the approach to estimate the adjusted overall effect is to fit a similar multi-moderator regression to the one above but including the effect sizes’ sampling variances (SV or vi) instead of their standard errors as a moderator. The reason for using sampling variance instead of standard errors when calculating the adjusted overall effect when the slope for ‘SE’ is statistically signficant is because in that scenario using ‘SV’ leads to an adjusted overall effect that is less downwardly biased than when using ‘SE’ (more in the main text; Stanley and Doucouliagos 2012, 2014).
What moderators to include in this version of the all-in publication bias test depends on what we want the adjusted overall effect size to reflect, but also keeping in mind that one of the purposes of including moderators in this all-in test is to explain as much heterogeneity as possible. We are going to fit three different models to showcase some of these possibilities. Importantly, we should keep in mind that publication bias tests should be treated as sensitivity analyses to test the robustness of results from the original analysis (Noble et al., 2017) and that none of the estimated adjusted overall effects should be treated as the real overall value if publication bias did not exist; after all, we can never be sure of what has not been published. The three models that we are going to fit are:
Multi-moderator meta-regression only including sampling variance and year of publication (mean-centred) as moderators. The intercept of this meta-regression will represent the adjusted overall effect size.
Multi-moderator meta-regression including sampling variance and year of publication (mean-centred) as moderators, and ‘captivity status,’ which has five levels, as a random effect. The goal of this meta-regression is to provide an adjusted overall effect size after accouting for the variance explained by ‘captivity status.’ Bear in mind that other alternative approaches might be used (e.g. mean-centring the categorical factor following Schielzeth 2010), but results should not change substantially.
Multi-moderator meta-regression including sampling variance, year of publication (mean-centred) and the categorical factor ‘captivity status,’ which has five levels, as moderators. In this meta-regression, we are going to remove the intercept so that we obtain an adjusted overall effect for each of the levels of the factor ‘captivity status.’
# model 1: ignoring CaptivityC
publication.bias.model.r.v.1 <- rma.mv(yi, vi,
mods=~1+vi +
year.c,
random=list(~ 1 | Species.updated, # phylo effect
~ 1 | Species, # non-phylo effect
~ 1 | study,
~ 1 | obsID),
R = list(Species.updated = phylo_cor), #phylogenetic relatedness
method="REML",
test="t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.r)
# model 2: adding CaptivityC as a random effect to provide an adjusted overall effect across levels
publication.bias.model.r.v.2 <- rma.mv(yi, vi,
mods=~1+vi +
year.c, # take Class - that is already a part of phylo
random=list(~ 1 | Species.updated, # phylo effect
~ 1 | Species, # non-phylo effect
~ 1 | study,
~ 1 | obsID,
~ 1 | CaptivityC),
R = list(Species.updated = phylo_cor), #phylogenetic relatedness
method="REML",
test="t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.r)
# model 3: providing an adjusted effect for each level of CaptivityC
publication.bias.model.r.v.3 <- rma.mv(yi, vi,
mods=~-1+vi +
year.c +
CaptivityC, # take Class - that is already a part of phylo
random=list(~ 1 | Species.updated, # phylo effect
~ 1 | Species, # non-phylo effect
~ 1 | study,
~ 1 | obsID),
R = list(Species.updated = phylo_cor), #phylogenetic relatedness
method="REML",
test="t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.r)
# model 3b: comparing results without correcting for publication bias
publication.bias.model.r.v.3b <- rma.mv(yi, vi,
mods=~-1+
CaptivityC, # take Class - that is already a part of phylo
random=list(~ 1 | Species.updated, # phylo effect
~ 1 | Species, # non-phylo effect
~ 1 | study,
~ 1 | obsID),
R = list(Species.updated = phylo_cor), #phylogenetic relatedness
method="REML",
test="t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.r)
# extracting the mean and 95% confidence intervals
estimates.publication.bias.model.r.v.1 <- estimates.CI(publication.bias.model.r.v.1)
estimates.publication.bias.model.r.v.2 <- estimates.CI(publication.bias.model.r.v.2)
estimates.publication.bias.model.r.v.3 <- estimates.CI(publication.bias.model.r.v.3)
estimates.publication.bias.model.r.v.3b <- estimates.CI(publication.bias.model.r.v.3b)
Model 1 estimated an adjusted overall effect size that was slightly smaller (adjusted effect = 0.26, 95% CI = [0.12,0.4]) than that obtained by the multilevel meta-analysis (unadjusted effect = 0.29, 95% CI = [0.17,0.41]; more in section S4.1.3), highlithing the downward correction of the overall effect after accounting for publication bias. For our worked example, model 2 provides the same exact adjusted overall effect (adjusted effectcaptivity = 0.26, 95% CI = [0.12,0.4]) than model 1 since the variance explained by the random effect ‘captivity status’ is zero. The latter result agrees with the low variance explained by ‘captivity status’ when included as a moderator (R2marginal = 2.3%). Ideally, the moderators included in the all-in publication bias test would explain more variance, but this will depend on the specific meta-analytic dataset being analyzed. Last, model 3 also showed slightly smaller effects for all levels of the categorical factor ‘captivity status’ that the unadjusted model (Table S4.2).
# creating a table to show the results of model 2
table.comparing.captivity.levels <- merge(estimates.publication.bias.model.r.v.3[c(3:7),
], estimates.publication.bias.model.r.v.3b, by = "estimate", all.x = T)
# rounding estimates
table.comparing.captivity.levels <- table.comparing.captivity.levels %>% mutate_at(vars(mean.x,
lower.x, upper.x, mean.y, lower.y, upper.y), funs(round(., 2)))
table.comparing.captivity.levels <- data.frame(captivity.level = table.comparing.captivity.levels[,
1], adjusted.mean = table.comparing.captivity.levels[, 2], adjusted.CI = paste0("[",
table.comparing.captivity.levels[, 3], ",", table.comparing.captivity.levels[,
4], "]"), unadjusted.mean = table.comparing.captivity.levels[, 5], unadjusted.CI = paste0("[",
table.comparing.captivity.levels[, 6], ",", table.comparing.captivity.levels[,
7], "]"))
table.comparing.captivity.levels[, 1] <- c("FC", "FW", "WCT", "WCT & FC", "WCT & FW")
# creating a fancy table using the R package 'gt'
table.model.r.captivity.gt <- table.comparing.captivity.levels %>% gt() %>% cols_label(captivity.level = md("**Captivity status**"),
adjusted.mean = md("**Adjusted mean**"), adjusted.CI = md("**Adjusted 95% CI**"),
unadjusted.mean = md("**Unadjusted mean**"), unadjusted.CI = md("**Unadjusted 95% CI**")) %>%
cols_align(align = "center")
table.model.r.captivity.gt
| Captivity status | Adjusted mean | Adjusted 95% CI | Unadjusted mean | Unadjusted 95% CI |
|---|---|---|---|---|
| FC | 0.24 | [0.1,0.39] | 0.28 | [0.15,0.41] |
| FW | 0.24 | [0.01,0.46] | 0.27 | [0.05,0.49] |
| WCT | 0.28 | [0.12,0.43] | 0.30 | [0.16,0.45] |
| WCT & FC | 0.43 | [0.08,0.77] | 0.43 | [0.09,0.78] |
| WCT & FW | 0.12 | [-0.16,0.41] | 0.16 | [-0.12,0.43] |
Table S4.2. Adjusted and unadjusted for publication bias results for all levels of the moderator ‘captivity status’ using the all-in publication bias test (multi-moderator meta-regression). Estimates are presented as standardized effect sizes using Fisher’s transformation (i.e. Zr).
For our second worked example, we used the dataset provided by Murphy et al. (2013) (ref: A meta-analysis of declines in local species richness from human disturbances), who tested the effects of human-caused disturbances on changes in species richness across both terrestrial and aquatic biomes, using log response ratio (i.e., lnRR). Briefly, the authors found that human-mediated disturbances lead to an statistically significant 18.3% decline in species richness across five anthropogenic disturbances (e.g., species invasions, nutrient addition, temperature increase, habitat loss or fragmentation, and land-use change). Here, we reanalyse the data from Murphy et al. (2013) by: (1) conducting a multilevel intercept-only meta-analytic model using lnRR and SMD as dependent variables, respectively; (2) testing for evidence of publication bias following the approaches outlined in the main text (meta-regression approach); (3) obtaining an adjusted overall mean using the approaches outlined in the main text (two-stage approach based on meta-regression approach).
# importing dataset with mean differences
dataset.original.mean.diff <- read_excel(here("data", "ft027.xlsx"), col_names = TRUE)
# names(dataset.original.mean.diff) str(dataset.original.mean.diff)
# lnRR
dataset.rr <- escalc(measure = "ROM", m1i = T_mean, m2i = C_mean, sd1i = T_sd, sd2i = C_sd,
n1i = T_N, n2i = C_N, data = dataset.original.mean.diff, append = T)
dataset.smd <- escalc(measure = "SMD", m1i = T_mean, m2i = C_mean, sd1i = T_sd, sd2i = C_sd,
n1i = T_N, n2i = C_N, data = dataset.original.mean.diff, append = T)
# metrics <- data.frame(RR = lnRR$yi, VRR = lnRR$vi, SMD = SMD$yi, VSMD = SMD$vi)
# dataset.mean.diff <- cbind(dataset.original.mean.diff, metrics) # bind_cols()
# clean NA
dataset.rr <- dataset.rr[!is.na(dataset.rr$yi) & dataset.rr$vi != 0, ]
dataset.smd <- dataset.smd[!is.na(dataset.smd$yi) & dataset.smd$vi != 0, ]
Before reanalysis, it is important to explore the meta-analytic dataset in search of potential outliers that might be caused by data extraction errors. To do so, we drew funnel plots showing the standard error and the inverse of the standard error (i.e. precision) in the y-axes (Figure S4.6). Figure S4.6 showed that outliers seemed to be present in this meta-analytic dataset.
Funnel plots with 1/SE for lnRR (left-hand side) and SMD (right-hand side) as measures of uncertainty to explore the existence of outliers in the meta-analytic dataset.
# here it is OK - but we find some strange data in SMD and lnRR
par(mfrow = c(1, 2))
funnel(dataset.rr$yi, dataset.rr$vi, yaxis="seinv",
#xlim = c(-3, 3),
ylab = "Precision (1/SE)",
xlab = "Effect size (lnRR)")
funnel(dataset.smd$yi, dataset.smd$vi, yaxis="seinv",
#xlim = c(-3, 3),
ylab = "Precision (1/SE)",
xlab = "Effect size (SMD)")
By checking the dataset, We found the minimum lnRR and SMD were usual: the minimum lnRR is -2.79 and SMD -11.3. We decided to taking the minimum values out. Then we checked the funnel plots again (Figure S4.7). This time, funnel plots seemed to be normal.
Funnel plots with 1/SE as the measure of uncertainty for lnRR (left-hand side) and SMD (right-hand side) when taking out outliers in the meta-analytic dataset.
# taking outilers
dataset.rr <- dataset.rr[dataset.rr$vi != min(dataset.rr$vi), ]
dataset.smd <- dataset.smd[dataset.smd$yi != min(dataset.smd$yi),]
par(mfrow = c(1, 2))
funnel(dataset.rr$yi, dataset.rr$vi, yaxis="seinv",
#xlim = c(-3, 3),
ylab = "Precision (1/SE)",
xlab = "Effect size (lnRR)")
funnel(dataset.smd$yi, dataset.smd$vi, yaxis="seinv",
#xlim = c(-3, 3),
ylab = "Precision (1/SE)",
xlab = "Effect size (SMD)")
To avoide ‘artefactual’ funnel asymmetry, we suggest using “effective sample size” rather than SE for mean difference effect sizes, such as lnRR and SMD. Let’s check the funnel plots (Figure S4.8) which use effective sample size as y-axis.
Funnel plots with effective sample size as the measure of uncertaintyfor lnRR (left-hand side) and SMD (right-hand side).
# calculating "effective sample size" to account for unbalanced sampling, for SMD and lnRR (e_n, hereafter, see Equation 25 from the main manuscript)
dataset.rr$e_n <- 4*(dataset.rr$C_N*dataset.rr$T_N) / (dataset.rr$C_N + dataset.rr$T_N)
dataset.smd$e_n <- 4*(dataset.smd$C_N*dataset.smd$T_N) / (dataset.smd$C_N + dataset.smd$T_N)
#using effecitve sampling size
par(mfrow = c(1, 2))
funnel(dataset.rr$yi, dataset.rr$vi, ni = dataset.rr$e_n, yaxis="ni",
#xlim = c(-3, 3),
ylab = "Effective sample size",
xlab = "Effect size (lnRR)")
funnel(dataset.smd$yi, dataset.smd$vi, ni = dataset.smd$e_n, yaxis="ni",
#xlim = c(-3, 3),
ylab = "Effective sample size",
xlab = "Effect size (SMD)")
When comparing funnel plots with 1/SE (Figure 4.7) and funnel plots with effective sample size (Figure 4.8), it was obvious that data points in funnel plots with effective sample size looked more ‘scattered,’ whereas data points in funnel plots with 1/SE looked more ‘crowded.’ This is because when using 1/SE as y-axis, that is, SE inherently correlate with effect size (e.g. lnRR and SMD). The SMD’s variance has the square of the point estimate (i.e. SMD; Equations 2), which leads to a correlation between SMDs and sampling SE, resulting in ‘artefactual’ funnel asymmetry. lnRR has the same issue (see Equation 4).
Next, we used the dataset from Murphy et al. (2013) to provide a worked example on how to test publication bias using the multilevel meta-regression method suggested in the main text. First, we reanalysed the dataset using a multilevel meta-analytic (intercept-only) model and corresponding funnel plot showing the overall effect or meta-analytic mean.
# creating a unit-level random effect to model residual variance in metafor
dataset.rr$obsID <- 1:nrow(dataset.rr)
dataset.smd$obsID <- 1:nrow(dataset.smd)
# running multilevel intercept-only meta-analytic model
meta.analysis.model.rr <- rma.mv(yi, vi,
mods = ~ 1,
random = list(~1 |Disturbance_category,
~ 1 | paperID,
~ 1 | obsID),
method = "REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data = dataset.rr)
meta.analysis.model.smd <- rma.mv(yi, vi,
mods = ~ 1,
random = list(~1 |Disturbance_category,
~ 1 | paperID,
~ 1 | obsID),
method = "REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data = dataset.smd)
# extracting the mean, 95% confidence intervals and 95% prediction intervals
#print(meta.analysis.model.1, digits=3)
metaanalytic.mean.model.rr <- predict(meta.analysis.model.rr, digits=3)
metaanalytic.mean.model.smd <- predict(meta.analysis.model.smd, digits=3)
#forest(meta.analysis.model.1)
# estimating relative heterogeneity I2
I2.model.rr <- i2_ml(meta.analysis.model.rr)*100
I2.model.smd <- i2_ml(meta.analysis.model.smd)*100
# and absolute heterogeneity Q
#Q.model.RR <- c(meta.analysis.model.RR$QE)
#Q.model.SMD <- c(meta.analysis.model.SMD$QE)
# creating a table to show the heterogeneity estimates
table.model.rr <- data.frame(n=length(unique(dataset.rr$paperID)),
k=nrow(dataset.rr),
mean=round(metaanalytic.mean.model.rr[[1]],2),
CI=paste0("[",round(metaanalytic.mean.model.rr[[3]],2),",",round(metaanalytic.mean.model.rr[[4]],2),"]"),
PI=paste0("[",round(metaanalytic.mean.model.rr[[5]],2),",",round(metaanalytic.mean.model.rr[[6]],2),"]"),
I2_obsID=round(I2.model.rr[[3]],1),
I2_paperID=round(I2.model.rr[[2]],1),
I2_total=round(I2.model.rr[[2]],1))
rownames(table.model.rr) <- NULL
#write.csv(table.model.RR, "./table/table.model.RR.csv", row.names = FALSE)
table.model.smd <- data.frame(n=length(unique(dataset.smd$paperID)),
k=nrow(dataset.smd),
mean=round(metaanalytic.mean.model.smd[[1]],2),
CI=paste0("[",round(metaanalytic.mean.model.smd[[3]],2),",",round(metaanalytic.mean.model.smd[[4]],2),"]"),
PI=paste0("[",round(metaanalytic.mean.model.smd[[5]],2),",",round(metaanalytic.mean.model.smd[[6]],2),"]"),
I2_obsID=round(I2.model.smd[[3]],1),
I2_paperID=round(I2.model.smd[[2]],1),
I2_total=round(I2.model.smd[[2]],1))
rownames(table.model.smd) <- NULL
#write.csv(table.model.SMD, "./table/table.model.SMD.csv", row.names = FALSE)
# creating a fancy table using the R package 'gt'
table.model.rr.gt <- table.model.rr %>%
gt() %>%
cols_label(n=md("**n**"),
k=md("**k**"),
mean=md("**Meta-analytic mean**"),
CI=md("**95% CI**"),
PI=md("**95% PI**"),
I2_obsID=md("***I*<sup>2</sup><sub>residual</sub> (%)**"),
I2_paperID=md("***I*<sup>2</sup><sub>study</sub> (%)**"),
I2_total=md("***I*<sup>2</sup><sub>total</sub> (%)**")) %>%
cols_align(align = "center") %>%
tab_source_note(source_note = md("n = number of studies; k = number of effects; CI = confidence interval; PI = prediction interval; *I*<sup>2</sup> = heterogeneity")) #%>%
#tab_options(table.width=775)
table.model.rr.gt
| n | k | Meta-analytic mean | 95% CI | 95% PI | I2residual (%) | I2study (%) | I2total (%) |
|---|---|---|---|---|---|---|---|
| 243 | 325 | -0.16 | [-0.24,-0.09] | [-0.89,0.56] | 32.7 | 4 | 4 |
| n = number of studies; k = number of effects; CI = confidence interval; PI = prediction interval; I2 = heterogeneity | |||||||
table.model.smd.gt <- table.model.smd %>%
gt() %>%
cols_label(n=md("**n**"),
k=md("**k**"),
mean=md("**Meta-analytic mean**"),
CI=md("**95% CI**"),
PI=md("**95% PI**"),
I2_obsID=md("***I*<sup>2</sup><sub>residual</sub> (%)**"),
I2_paperID=md("***I*<sup>2</sup><sub>study</sub> (%)**"),
I2_total=md("***I*<sup>2</sup><sub>total</sub> (%)**")) %>%
cols_align(align = "center") %>%
tab_source_note(source_note = md("n = number of studies; k = number of effects; CI = confidence interval; PI = prediction interval; *I*<sup>2</sup> = heterogeneity")) #%>%
#tab_options(table.width=775)
table.model.smd.gt
| n | k | Meta-analytic mean | 95% CI | 95% PI | I2residual (%) | I2study (%) | I2total (%) |
|---|---|---|---|---|---|---|---|
| 240 | 321 | -0.49 | [-0.64,-0.35] | [-2.54,1.55] | 57.3 | 0 | 0 |
| n = number of studies; k = number of effects; CI = confidence interval; PI = prediction interval; I2 = heterogeneity | |||||||
Table S4.2 Results of the multilevel intercept-only meta-analysis testing the effects of human-caused disturbances on changes in species richness across both terrestrial and aquatic biomes, using log response ratio (i.e., lnRR). Estimates are presented as standardized effect sizes using log transformed ratio of means (i.e. lnRR) and standardized mean difference (i.e. SMD)
Funnel plot using effective sample size as a measure of uncertainty and showing the meta-analytic mean (vertical line) obtained in the multilevel intercept-only meta-analysis.
par(mfrow = c(1, 2))
funnel(meta.analysis.model.rr, yaxis="ni",
#xlim = c(-3, 3),
ylab = "Effective sample size",
xlab = "Effect size (lnRR)")
funnel(dataset.smd$yi, dataset.smd$vi, yaxis="ni",
#xlim = c(-3, 3),
ylab = "Effective sample size",
xlab = "Effect size (SMD)")
To test for publication bias (evidence of small-study effects), we first fitted a uni-moderator multilevel meta-regression including the square root of the inverse of effective sample size [sqrt(inv_n_tilda)] (Equation 26) as the only moderator (see Equation 27 from the main text). This meta-regression will provide some information about the existence of small-study effects (i.e. asymmetry in the distribution of effect sizes) after accounting for non-independence. However, high levels of heterogeneity also poses problems for publication bias tests in meta-analytic datasets in ecology and evolution. Therefore, the best evidence approach is to model both heterogeneity and non-independence, by using multi-moderator multilevel meta-regression which includes all the important moderators (see all-in publication bias test described in section S4.2.4.3).
# calculating "effective sample size" to account for unbalanced sampling, for SMD and lnRR (see Equation 25 from the main manuscript)
# getting the inverse of it
dataset.rr$inv_n_tilda <- (dataset.rr$C_N + dataset.rr$T_N)/(dataset.rr$C_N*dataset.rr$T_N)
dataset.rr$sqrt_inv_n_tilda <- sqrt(dataset.rr$inv_n_tilda)
dataset.smd$inv_n_tilda <- (dataset.smd$C_N + dataset.smd$T_N)/(dataset.smd$C_N*dataset.smd$T_N)
dataset.smd$sqrt_inv_n_tilda <- dataset.smd$inv_n_tilda
# Application of Equation 27 from the main manuscript
publication.bias.model.rr.srin <- rma.mv(yi, vi,
mods= ~ 1 + sqrt_inv_n_tilda,
random=list(~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.rr)
#publication.bias.model.RR.srin
publication.bias.model.smd.srin <- rma.mv(yi, vi,
mods= ~ 1 + sqrt_inv_n_tilda,
random=list(~1 |Disturbance_category
,~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.smd)
#publication.bias.model.SMD.srin
#print(publication.bias.model.1.se,digits=3)
# extracting the mean and 95% confidence intervals
estimates.publication.bias.model.rr.srin <- estimates.CI(publication.bias.model.rr.srin)
estimates.publication.bias.model.smd.srin <- estimates.CI(publication.bias.model.smd.srin)
We found that there was a indication of small-study effects since the slopes of the moderator ‘sqrt(inv_n_tilda)’ is marginally statistically significant for lnRR (slope = -0.26, 95% CI = [-0.49,-0.03]; R2marginal = 2.3%) and statistically significant for SMD (slope = -0.66, 95% CI = [-1.36,0.05]; R2marginal = 2%), which showed that effect sizes with larger uncertainty [larger sqrt(inv_n_tilda)] tend to be larger. Nonetheless, as mentioned above, we will confirm whether the small-study effects exist after accounting for some of the heterogeneity present in the data using the all-in publication bias test (multi-moderator meta-regression) described in section S4.2.4.3.
For Time-lag bias test, also called decline effects, we used the same approach described the above but including the year of publication (mean-centred) as the only moderator (see Equation 23 from the main text)。
# mean-centring year of publication to help with interpretation, particularly in S4.1.4.3.
dataset.rr$year.c <- as.vector(scale(dataset.rr$year, scale = F))
dataset.smd$year.c <- as.vector(scale(dataset.smd$year, scale = F))
# Application of Equation 23 from the main manuscript
publication.bias.model.rr.timelag <- rma.mv(yi, vi,
mods= ~ 1 + year.c,
random=list(~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.rr)
publication.bias.model.smd.timelag <- rma.mv(yi, vi,
mods= ~ 1 + year.c,
random=list(~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.smd)
#summary(publication.bias.model.1.timelag)
# extracting the mean and 95% confidence intervals
estimates.publication.bias.model.rr.timelag <- estimates.CI(publication.bias.model.rr.timelag)
estimates.publication.bias.model.smd.timelag <- estimates.CI(publication.bias.model.smd.timelag)
According to this uni-moderator meta-regression, there was a indication of decline effects since the slope of the moderator ‘year of publication’ is marginally statistically significant for lnRR (slope = 0.01, 95% CI = [0,0.02]; R2marginal = 1.6%) and SMD (slope = 0.03, 95% CI = [0,0.06]; R2marginal = 1.6%). Nonetheless, as mentioned above, we need to confirm this pattern after accounting for some of the heterogeneity present in the data using the all-in publication bias test (multi-moderator meta-regression) described in section S4.2.3.3.
The best evidence of small-study (publication bias) and decline effects (time-lag bias) comes from multi-moderator meta-regression. Therefore, we ombined Equations 27 and 23 with other moderators since those additional moderators generally explained or were predicted to explain some of the heterogeneity. For this example, we ran a multi-moderator meta-regression including the square of inverse of effective sample size, the year of publication (mean-centred) and a moderator originally included in Murphy et al. (2013), that is, the variable of study type: ‘Experimental.or.Observational.’
# preparing the moderators that need to be included in a meta-regression that also contains a moderator with the standard errors of the effect sizes and the year of publication
# Application of Equation 29 from the main manuscript
publication.bias.model.rr.all.srin <- rma.mv(yi, vi,
mods= ~ -1 + sqrt_inv_n_tilda +
year.c +
Experimental.or.Observational.,
random=list(
~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.rr,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
publication.bias.model.smd.all.srin <- rma.mv(yi, vi,
mods= ~ -1 + sqrt_inv_n_tilda +
year.c +
Experimental.or.Observational.,
random=list(
~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.smd,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
# extracting the mean and 95% confidence intervals
estimates.publication.bias.model.rr.all.srin <- estimates.CI(publication.bias.model.rr.all.srin)
estimates.publication.bias.model.smd.all.srin <- estimates.CI(publication.bias.model.smd.all.srin)
The all-in publication bias test was not fully consistent with what we observed in the uni-moderator meta-regressions above (sections S4.2.4.1 and S4.2.4.2). First, the multi-moderator meta-regression showed a statistically significant slope for the moderator ‘sqrt(inv_n_tilda)’ (slope = -0.25, 95% CI = [-0.47,-0.03]; Figure S4.10 and S4.11), confirming evidence of small-study effects. This result did not agree with the findings originally reported in Murphy et al. (2013). The authors found no evidence of publication bias using funnel plot with sample size which showed a clear funnel shape with a much greater spread of studies with small sample sizes and a decrease in this spread as sample size increases. Notae that do not recommend to use this method for meta-analytic datasets with multiple levels of non-independence and high heterogeneity. Second, the all-in publication bias test showed that there no evidence of decline effects for lnRR since the moderator ‘year of publication’ was indistinguishable from zero (slope = 0.01, 95% CI = [0,0.02]; Figure S4.12). But the all-in publication bias test found a indication of decline effects for smd since the corresponding slope was marginally statsitically significant (slope = 0.03, 95% CI = [-0.01,0.06]; Figure S4.13).
# to plot the 'sqrt(inv_n_tilda)' effect we are simply including the moderator 'Experimental.or.Observational.' as a random effect for simplicity of plotting. Note that this does not substantially change the estimates of the moderators, and that alternative approaches could be used for this.
publication.bias.model.rr.all.srin.plot <- rma.mv(yi, vi,
mods= ~ -1 + sqrt_inv_n_tilda +
year.c,
random=list(~1 | +
Experimental.or.Observational.,
~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.rr,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
publication.bias.model.smd.all.srin.plot <- rma.mv(yi, vi,
mods= ~ -1 + sqrt_inv_n_tilda +
year.c,
random=list(~1|
Experimental.or.Observational.,
#~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.smd,
control=list(optimizer="optim", optmethod="Nelder-Mead")) # convergence issue when incorporating as random effect
lnRR with larger uncertainty (square root of inverse of effective sample size) is larger, showing evidence of small-study effects in the meta-analytic dataset. The solid line represents the model estimate and the shading shows its 95% confidence intervals.
# predicting effect sizes for a range of sqrt(inv_n_tilda) from the minimum
# observed in the data to the maximum observed in data, while year is kept at its
# mean value, in this case 0 since year was mean-centred
predict.publication.bias.model.rr.all.srin.plot.1 <- predict(publication.bias.model.rr.all.srin.plot,
newmods = cbind(seq(min(dataset.rr$sqrt_inv_n_tilda), max(dataset.rr$sqrt_inv_n_tilda),
length.out = 324), c(0)))
newdat <- data.frame(sei = seq(min(dataset.rr$sqrt_inv_n_tilda), max(dataset.rr$sqrt_inv_n_tilda),
length.out = 324), fit = predict.publication.bias.model.rr.all.srin.plot.1$pred,
upper = predict.publication.bias.model.rr.all.srin.plot.1$ci.ub, lower = predict.publication.bias.model.rr.all.srin.plot.1$ci.lb,
stringsAsFactors = FALSE)
xaxis <- dataset.rr$sqrt_inv_n_tilda
yaxis <- dataset.rr$yi
plot(xaxis, yaxis, type = "n", ylab = "", xlab = "", xaxt = "n", yaxt = "n")
abline(a = 0, b = 0, lwd = 1, lty = 1)
axis(1, at = seq(0, 1.6, 0.2), cex.axis = 0.8, tck = -0.02)
axis(2, at = round(seq(-3, 2, 0.5), 1), cex.axis = 0.8, las = 2, tck = -0.02)
title(xlab = "square root of inverse of effective sample size", ylab = "effect size (lnRR)",
line = 2.75, cex.lab = 1.4)
points(xaxis, yaxis, bg = rgb(0, 0, 0, 0.1), col = rgb(0, 0, 0, 0.2), pch = 21, cex = 1)
lines(newdat$sei, newdat$fit, lwd = 2.75, col = "darkorchid4")
polygon(c(newdat$sei, rev(newdat$sei)), c(newdat$lower, rev(newdat$upper)), border = NA,
col = rgb(104/255, 34/255, 139/255, 0.5))
SMD with larger uncertainty (square root of inverse of effective sample size) is larger, showing evidence of small-study effects in the meta-analytic dataset. The solid line represents the model estimate and the shading shows its 95% confidence intervals.
# predicting effect sizes for a range of SE from the minimum observed in the data
# to the maximum observed in data, while year is kept at its mean value, in this
# case 0 since year was mean-centred
predict.publication.bias.model.smd.all.srin.plot.1 <- predict(publication.bias.model.smd.all.srin.plot,
newmods = cbind(seq(min(dataset.smd$sqrt_inv_n_tilda), max(dataset.smd$sqrt_inv_n_tilda),
length.out = 320), c(0)))
newdat <- data.frame(sei = seq(min(dataset.smd$sqrt_inv_n_tilda), max(dataset.smd$sqrt_inv_n_tilda),
length.out = 320), fit = predict.publication.bias.model.smd.all.srin.plot.1$pred,
upper = predict.publication.bias.model.smd.all.srin.plot.1$ci.ub, lower = predict.publication.bias.model.smd.all.srin.plot.1$ci.lb,
stringsAsFactors = FALSE)
xaxis <- dataset.smd$sqrt_inv_n_tilda
yaxis <- dataset.smd$yi
plot(xaxis, yaxis, type = "n", ylab = "", xlab = "", xaxt = "n", yaxt = "n")
abline(a = 0, b = 0, lwd = 1, lty = 1)
axis(1, at = seq(0, 1.2, 0.2), cex.axis = 0.8, tck = -0.02)
axis(2, at = round(seq(-8, 4, 2), 1), cex.axis = 0.8, las = 2, tck = -0.02)
title(xlab = "square root of inverse of effective sample size", ylab = "effect size (SMD)",
line = 2.75, cex.lab = 1.4)
points(xaxis, yaxis, bg = rgb(0, 0, 0, 0.1), col = rgb(0, 0, 0, 0.2), pch = 21, cex = 1)
lines(newdat$sei, newdat$fit, lwd = 2.75, col = "darkorchid4")
polygon(c(newdat$sei, rev(newdat$sei)), c(newdat$lower, rev(newdat$upper)), border = NA,
col = rgb(104/255, 34/255, 139/255, 0.5))
The averaged lnRR has remained seemingly unchanged over time since the first effect size was published. The solid line represents the model estimate, the shading shows its 95% confidence intervals.
# predicting effect sizes for a range of sqrt(inv_n_tilda) from the minimum
# observed in the data to the maximum observed in data, while year is kept at its
# mean value, in this case 0 since year was mean-centred
predict.publication.bias.model.rr.all.srin.plot.2 <- predict(publication.bias.model.rr.all.srin.plot,
newmods = cbind(mean(dataset.rr$sqrt_inv_n_tilda), seq(min(dataset.rr$year.c),
max(dataset.rr$year.c), length.out = 324)))
newdat <- data.frame(year = seq(min(dataset.rr$year), max(dataset.rr$year), length.out = 324),
fit = predict.publication.bias.model.rr.all.srin.plot.2$pred, upper = predict.publication.bias.model.rr.all.srin.plot.2$ci.ub,
lower = predict.publication.bias.model.rr.all.srin.plot.2$ci.lb, stringsAsFactors = FALSE)
xaxis <- dataset.rr$year
yaxis <- dataset.rr$yi
plot(xaxis, yaxis, type = "n", ylab = "", xlab = "", xaxt = "n", yaxt = "n")
abline(a = 0, b = 0, lwd = 1, lty = 1)
axis(1, at = seq(1989, 2013, 5), cex.axis = 0.8, tck = -0.02)
axis(2, at = round(seq(-3, 2, 0.5), 1), cex.axis = 0.8, las = 2, tck = -0.02)
title(xlab = "year of publication", ylab = "effect size (lnRR)", line = 2.75, cex.lab = 1.4)
points(xaxis, yaxis, bg = rgb(0, 0, 0, 0.1), col = rgb(0, 0, 0, 0.2), pch = 21, cex = 1)
lines(newdat$year, newdat$fit, lwd = 2.75, col = "darkorchid4")
polygon(c(newdat$year, rev(newdat$year)), c(newdat$lower, rev(newdat$upper)), border = NA,
col = rgb(104/255, 34/255, 139/255, 0.5))
The averaged SMD tended to decline over time since the first effect size was published. The solid line represents the model estimate, the shading shows its 95% confidence intervals.
# predicting effect sizes for a range of sqrt(inv_n_tilda) from the minimum
# observed in the data to the maximum observed in data, while year is kept at its
# mean value, in this case 0 since year was mean-centred
predict.publication.bias.model.smd.all.srin.plot.2 <- predict(publication.bias.model.smd.all.srin.plot,
newmods = cbind(mean(dataset.smd$sqrt_inv_n_tilda), seq(min(dataset.smd$year.c),
max(dataset.smd$year.c), length.out = 320)))
newdat <- data.frame(year = seq(min(dataset.smd$year), max(dataset.smd$year), length.out = 320),
fit = predict.publication.bias.model.smd.all.srin.plot.2$pred, upper = predict.publication.bias.model.smd.all.srin.plot.2$ci.ub,
lower = predict.publication.bias.model.smd.all.srin.plot.2$ci.lb, stringsAsFactors = FALSE)
xaxis <- dataset.smd$year
yaxis <- dataset.smd$yi
plot(xaxis, yaxis, type = "n", ylab = "", xlab = "", xaxt = "n", yaxt = "n")
abline(a = 0, b = 0, lwd = 1, lty = 1)
axis(1, at = seq(1989, 2013, 5), cex.axis = 0.8, tck = -0.02)
axis(2, at = round(seq(-8, 4, 2), 1), cex.axis = 0.8, las = 2, tck = -0.02)
title(xlab = "year of publication", ylab = "effect size (SMD)", line = 2.75, cex.lab = 1.4)
points(xaxis, yaxis, bg = rgb(0, 0, 0, 0.1), col = rgb(0, 0, 0, 0.2), pch = 21, cex = 1)
lines(newdat$year, newdat$fit, lwd = 2.75, col = "darkorchid4")
polygon(c(newdat$year, rev(newdat$year)), c(newdat$lower, rev(newdat$upper)), border = NA,
col = rgb(104/255, 34/255, 139/255, 0.5))
We also provided a two-step approach to calculate the adjusted overall effect size, that is, the overall effect after we accounted for publication bias (see the main text). The rationale is that when no uncertainty (theoretically) exist, we can get the potential true effect from multiple-moderator multilevel-model (i.e. all-in publication bias test). Note that if the slope for the moderator ‘sqrt_inv_n_tilda’ in the all-in publication bias test shown above was not statistically significant, then the best estimate for the adjusted overall effect size would be the intercept of that all-in publication bias test. Otherwise, we should use inv_n_tilda instead of as moderator. We detailed the reasons in our main text. What moderators to include in this version of the all-in publication bias test depends on what we want the adjusted overall effect size to reflect, but also keeping in mind that one of the purposes of including moderators in this all-in test is to explain as much heterogeneity as possible. We are going to fit three different models to showcase some of these possibilities. Importantly, we should keep in mind that publication bias tests should be treated as sensitivity analyses to test the robustness of results from the original analysis (Noble et al., 2017) and that none of the estimated adjusted overall effects should be treated as the real overall value if publication bias did not exist; after all, we can never be sure of what has not been published. The three models that we are going to fit are:
Multi-moderator meta-regression only including the inverse of effective sample size and year of publication (mean-centred) as moderators. The intercept of this meta-regression will represent the adjusted overall effect size.
Multi-moderator meta-regression including the inverse of effective sample size and year of publication (mean-centred) as moderators, and ‘Experimental.or.Observational.’ which has five levels, as a random effect. The goal of this meta-regression is to provide an adjusted overall effect size after accouting for the variance explained by ‘Experimental.or.Observational.’ Bear in mind that other alternative approaches might be used (e.g. mean-centring the categorical factor following Schielzeth 2010), but results should not change substantially.
Multi-moderator meta-regression including the inverse of effective sample size, year of publication (mean-centred) and the categorical factor ‘Experimental.or.Observational.’ which has five levels, as moderators. In this meta-regression, we are going to remove the intercept so that we obtain an adjusted overall effect for each of the levels of the factor ‘captivity status.’
# lnRR
# model 1: ignoring additional moderators - Experimental.or.Observational.
publication.bias.model.rr.v.1 <- rma.mv(yi, vi,
mods=~1+inv_n_tilda +
year.c,
random=list(
~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.rr,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
# model 2: adding Experimental.or.Observational. as a random effect to provide an adjusted overall effect across levels
publication.bias.model.rr.v.2 <- rma.mv(yi, vi,
mods=~1+inv_n_tilda +
year.c,
random=list(
~1 |Disturbance_category,
~1|Experimental.or.Observational.,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.rr,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
# model 3: providing an adjusted effect for each level of CaptivityC
publication.bias.model.rr.v.3 <- rma.mv(yi, vi,
mods=~-1+inv_n_tilda +
year.c +
Experimental.or.Observational., # take Class - that is already a part of phylo
random=list(
~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.rr,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
# model 3b: comparing results without correcting for publication bias
publication.bias.model.rr.v.3b <- rma.mv(yi, vi,
mods=~-1+ Experimental.or.Observational., # take Class - that is already a part of phylo
random=list(
~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.rr,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
# extracting the mean and 95% confidence intervals
estimates.publication.bias.model.rr.v.1 <- estimates.CI(publication.bias.model.rr.v.1)
estimates.publication.bias.model.rr.v.2 <- estimates.CI(publication.bias.model.rr.v.2)
estimates.publication.bias.model.rr.v.3 <- estimates.CI(publication.bias.model.rr.v.3)
estimates.publication.bias.model.rr.v.3b <- estimates.CI(publication.bias.model.rr.v.3b)
# SMD
# model 1: ignoring additional moderators - Experimental.or.Observational.
publication.bias.model.smd.v.1 <- rma.mv(yi, vi,
mods=~1+inv_n_tilda +
year.c,
random=list(
~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.smd,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
# model 2: adding Experimental.or.Observational. as a random effect to provide an adjusted overall effect across levels
publication.bias.model.smd.v.2 <- rma.mv(yi, vi,
mods=~1+inv_n_tilda +
year.c,
random=list(
~1 |Disturbance_category,
#~1|Experimental.or.Observational.,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.smd,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
# model 3: providing an adjusted effect for each level of CaptivityC
publication.bias.model.smd.v.3 <- rma.mv(yi, vi,
mods=~-1+inv_n_tilda +
year.c +
Experimental.or.Observational., # take Class - that is already a part of phylo
random=list(
~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.smd,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
# model 3b: comparing results without correcting for publication bias
publication.bias.model.smd.v.3b <- rma.mv(yi, vi,
mods=~-1+ Experimental.or.Observational., # take Class - that is already a part of phylo
random=list(
~1 |Disturbance_category,
~ 1 | obsID,
~ 1 | paperID),
method="REML",
test = "t", # using t dist rather than z because t is a better, more conservative distribution than z
data=dataset.smd,
control=list(optimizer="optim", optmethod="Nelder-Mead"))
# extracting the mean and 95% confidence intervals
estimates.publication.bias.model.smd.v.1 <- estimates.CI(publication.bias.model.smd.v.1)
estimates.publication.bias.model.smd.v.2 <- estimates.CI(publication.bias.model.smd.v.2)
estimates.publication.bias.model.smd.v.3 <- estimates.CI(publication.bias.model.smd.v.3)
estimates.publication.bias.model.smd.v.3b <- estimates.CI(publication.bias.model.smd.v.3b)
Model 1 estimated an adjusted overall lnRR that was obviously smaller (adjusted effect = -0.09, 95% CI = [-0.21,0.03]) than that obtained by the multilevel meta-analysis (unadjusted effect = -0.16, 95% CI = [-0.24,-0.09]; more in section S4.2.2), highlithing the downward correction of the overall effect after accounting for publication bias and time-lag bias. For this worked example, model 2 provides the same exact adjusted overall effect (adjusted effectcaptivity = -0.09, 95% CI = [-0.27,0.1]) than model 1 since the variance explained by the random effect ‘Experimental.or.Observational.’ is very small (actually the Experimental.or.Observational. should not be included as random effect since it only has two levels). The latter result agrees with the low variance explained by ‘Experimental.or.Observational.’ when included as a moderator (R2marginal = 4.9%). Ideally, the moderators included in the all-in publication bias test would explain more variance, but this will depend on the specific meta-analytic dataset being analyzed. Last, model 3 also showed slightly smaller effects for all levels of the categorical factor ‘Experimental.or.Observational.’ that the unadjusted model (Table S4.3).
# creating a table to show the results of model 2
table.comparing.Exp.or.Obs.levels <- merge(estimates.publication.bias.model.rr.v.3[c(3:4),
], estimates.publication.bias.model.rr.v.3b, by = "estimate", all.x = T)
# rounding estimates
table.comparing.Exp.or.Obs.levels <- table.comparing.Exp.or.Obs.levels %>% mutate_at(vars(mean.x,
lower.x, upper.x, mean.y, lower.y, upper.y), funs(round(., 2)))
table.comparing.Exp.or.Obs.levels <- data.frame(Exp.or.Obs.level = table.comparing.Exp.or.Obs.levels[,
1], adjusted.mean = table.comparing.Exp.or.Obs.levels[, 2], adjusted.CI = paste0("[",
table.comparing.Exp.or.Obs.levels[, 3], ",", table.comparing.Exp.or.Obs.levels[,
4], "]"), unadjusted.mean = table.comparing.Exp.or.Obs.levels[, 5], unadjusted.CI = paste0("[",
table.comparing.Exp.or.Obs.levels[, 6], ",", table.comparing.Exp.or.Obs.levels[,
7], "]"))
table.comparing.Exp.or.Obs.levels[, 1] <- c("Experimental", "Observational")
# creating a fancy table using the R package 'gt'
table.comparing.Exp.or.Obs.levels.gt <- table.comparing.Exp.or.Obs.levels %>% gt() %>%
cols_label(Exp.or.Obs.level = md("**Experimental.or.Observational.**"), adjusted.mean = md("**Adjusted mean**"),
adjusted.CI = md("**Adjusted 95% CI**"), unadjusted.mean = md("**Unadjusted mean**"),
unadjusted.CI = md("**Unadjusted 95% CI**")) %>% cols_align(align = "center")
table.comparing.Exp.or.Obs.levels.gt
| Experimental.or.Observational. | Adjusted mean | Adjusted 95% CI | Unadjusted mean | Unadjusted 95% CI |
|---|---|---|---|---|
| Experimental | 0.00 | [-0.1,0.1] | -0.08 | [-0.15,0] |
| Observational | -0.17 | [-0.26,-0.09] | -0.25 | [-0.31,-0.18] |
# creating a table to show the results of model 2
table.comparing.Exp.or.Obs.levels <- merge(estimates.publication.bias.model.smd.v.3[c(3:4),
], estimates.publication.bias.model.smd.v.3b, by = "estimate", all.x = T)
# rounding estimates
table.comparing.Exp.or.Obs.levels <- table.comparing.Exp.or.Obs.levels %>% mutate_at(vars(mean.x,
lower.x, upper.x, mean.y, lower.y, upper.y), funs(round(., 2)))
table.comparing.Exp.or.Obs.levels <- data.frame(Exp.or.Obs.level = table.comparing.Exp.or.Obs.levels[,
1], adjusted.mean = table.comparing.Exp.or.Obs.levels[, 2], adjusted.CI = paste0("[",
table.comparing.Exp.or.Obs.levels[, 3], ",", table.comparing.Exp.or.Obs.levels[,
4], "]"), unadjusted.mean = table.comparing.Exp.or.Obs.levels[, 5], unadjusted.CI = paste0("[",
table.comparing.Exp.or.Obs.levels[, 6], ",", table.comparing.Exp.or.Obs.levels[,
7], "]"))
table.comparing.Exp.or.Obs.levels[, 1] <- c("Experimental", "Observational")
# creating a fancy table using the R package 'gt'
table.comparing.Exp.or.Obs.levels.gt <- table.comparing.Exp.or.Obs.levels %>% gt() %>%
cols_label(Exp.or.Obs.level = md("**Experimental.or.Observational.**"), adjusted.mean = md("**Adjusted mean**"),
adjusted.CI = md("**Adjusted 95% CI**"), unadjusted.mean = md("**Unadjusted mean**"),
unadjusted.CI = md("**Unadjusted 95% CI**")) %>% cols_align(align = "center")
table.comparing.Exp.or.Obs.levels.gt
| Experimental.or.Observational. | Adjusted mean | Adjusted 95% CI | Unadjusted mean | Unadjusted 95% CI |
|---|---|---|---|---|
| Experimental | -0.10 | [-0.45,0.25] | -0.37 | [-0.6,-0.13] |
| Observational | -0.37 | [-0.64,-0.09] | -0.57 | [-0.76,-0.39] |
Model 1 estimated an adjusted overall SMD that was obviously smaller (adjusted effect = -0.28, 95% CI = [-0.56,0]) than that obtained by the multilevel meta-analysis (unadjusted effect = -0.49, 95% CI = [-0.64,-0.35]; more in section S4.2.2), highlithing the downward correction of the overall effect after accounting for publication bias and time-lag bias. For this worked example, model 2 provides the same exact adjusted overall effect (adjusted effectcaptivity = -0.28, 95% CI = [-0.56,0]) than model 1 since the variance explained by the random effect ‘Experimental.or.Observational.’ is very small (actually the Experimental.or.Observational. should not be included as random effect since it only has two levels). The latter result agrees with the low variance explained by ‘Experimental.or.Observational.’ when included as a moderator (R2marginal = 0.9%). Ideally, the moderators included in the all-in publication bias test would explain more variance, but this will depend on the specific meta-analytic dataset being analyzed. Last, model 3 also showed slightly smaller effects for all levels of the categorical factor ‘Experimental.or.Observational.’ that the unadjusted model (Table S4.4).
In this section, we use the two meta-analytic datasets used in section S4 to provide worked examples for how to use alternative approaches to our suggested multilevel meta-regression method (see main text). The first step will be to deal with the non-independence present in the data, which we will do by aggregating (averaging) effect sizes per study (section S5.1.2) or by sampling one effect size per study (section S5.1.1). Keep in mind that neither averaging nor sampling are general solutions when we have a phylogenetic signal as it is the case in the dataset from Garamszegi et al. (2012; Table S4.1) since neither aggregating nor sampling will eliminate the non-independence stemming from the phylogenetic relationships among species (more in the main text). Nonetheless, we will ignore this phylogenetic non-independence for the purposes of providing the worked examples.
The use of aggregating and sampling will allow us to make use of traditional publication bias tests (see section 3 from the main text) such as the trim-and-fill method (section 5.1.1.1), selection models (section 5.1.1.2), and cumulative meta-analysis (section 5.1.1.3).
To aggregate (average) effect sizes per study we will use the function ‘aggregate()’ from the R package metafor (Viechtbauer 2010) using a compound symmetric structure (‘struct=CS’) and a value of 0.5 for rho following Noble et al. (2017). That is, when averaging effect sizes per study we are not assuming that the effect sizes of a study are independent to each other, but instead that they are correlated and, since we do not know the real value of that correlation, we will make an educated guess and use 0.5. The function ‘aggregate()’ will then combine multiple estimates within the same study using inverse-variance weighting taking the 0.5 correlation into consideration. More information about the function ‘aggregate()’ can be found by typing ?metafor::aggregate, including how ‘aggregate()’ deals with other variables (e.g. moderators) in the dataset.
# you will need the development version of 'metafor' package to run this bit rho
# = 0.5 is as in Noble et al (Mol Ecol)
dataset.r <- escalc(yi = yi, vi = vi, data = dataset.r)
dataset.r.agg <- aggregate(dataset.r, cluster = study, struct = "CS", rho = 0.5)
Using the aggregated data (see above), we will then fit the trim-and-fill method (Duval and Tweedie 2000a,b). The trim-and-fill method is a non-parametric method that can be used to both detect and correct for publication bias and, although it can handle some heterogeneity, it performs worse as heterogeneity increases (Peters et al., 2007; Moreno et al., 2009; more in the main text). Since we aggregated our data, we can know run a random-effects meta-analytic model rather than a multilevel model as used in section S.4, and we will run the trim-and-fill method using this model.
# random/mixed-effects meta-analytic model
meta.analysis.model.r.agg <- rma(yi, vi, test = "knha", data = dataset.r, slab = study)
# trim-and-fill method
trimfill.r <- trimfill(meta.analysis.model.r.agg)
The trim-and-fill method identified 15 (SE = 6.65) missing studies (Figure S5.1) and estimated an adjusted overall effect of 0.2 (95% CI = [0.13,0.27]), which is slighlty lower than the adjusted overall effect estimated by our suggested mulitlevel meta-regression method (0.26, 95% CI = [0.12,0.4]; more in section S4.1.4.3).
Funnel plot using precision (1/SE) as a measure of uncertainty and showing the adjusted meta-analytic mean (vertical line) and the missing effect sizes (white points) according to the trim-and-fill method.
funnel(trimfill.r, yaxis = "seinv")
Using the aggregated data (see above), we will also fit a selection model-based method (reviewed by Marks-Anglin & Chen, 2020). Although there are many selection models, all of them model how effect sizes are missing based on p values, effect sizes and/or sampling variance, and can provide an adjusted overall effect (e.g. Rodgers & Pustejovsky, 2020; more in the main text). Contrary to other methods, selection models can deal with and model heterogeneity (e.g. Citkowicz and Vevea 2017), but cannot deal with non-independent effect sizes. We will run a step function selection model based on one cut-point (α = 0.05); this model is sometimes referred to as a 3 parameter selection model (3PM; more in the main text; Rodgers & Pustejovsky, 2020; more by typing ?metafor::selmodel).
# without moderators
meta.analysis.r.agg0 <- rma(yi, vi,
method="ML", #method needs to set to "ML" rather than "REML" as in all models above
test="knha", #knha is meant to be better than Z and t, but it is not implemented for multilevel models
data=dataset.r)
three.PSM.r0 <- selmodel(meta.analysis.r.agg0, type="stepfun", steps=c(0.05,1)) #note that steps=c(0.05) would be give the same results
# look at confidence intervals - not just point estimate
# summary(three.PSM.r0)
# with moderators
meta.regression.r.agg <- rma(yi, vi,
mod = ~ year.c +
CaptivityC - 1,
method="ML",
test="knha",
data=dataset.r)
three.PSM.r <- selmodel(meta.regression.r.agg, type="stepfun", steps=c(0.05)) #note that steps=c(0.05,1) would be give the same results
# summary(three.PSM.r)
# extracting the mean and 95% confidence intervals
estimates.three.PSM.r <- estimates.CI(three.PSM.r)
The step function selection model based on one cut-point (α = 0.05) for the intercept-only model estimated an adjusted overall effect of 0.31 (95% CI = [0.21,0.41]; Figure S5.2), which is larger than the original overall effect estimated by the meta-analytic model (0.29, 95% CI = [0.17,0.41]; more in section S4.1.4.3). Overall, the selection model estimated adjusted overall effects that were either very similar or even slightly larger than the original unadjusted effect sizes (Table S5.1), which might indicate that the specific selection model that we used (i.e. one cut-point (α = 0.05)) was not an adequate approximation for the underlying selection process.
Results of a step function selection model based on one cut-point (α = 0.05) for a model without any moderators (left-hand figure) and a model including both ‘year of publication’ and ‘captivity status’ as moderators (right-hand figure).
par(mfrow = c(1, 2))
plot(three.PSM.r0, ylim = c(0, 2), col = "red", lty = 3)
plot(three.PSM.r, ylim = c(0, 2), col = "red", lty = 3)
# creating a table to show the results of the selection model
table.comparing.captivity.levels.3PSM <- merge(estimates.three.PSM.r[c(2:6), ], estimates.publication.bias.model.r.v.3b,
by = "estimate", all.x = T)
# rounding estimates
table.comparing.captivity.levels.3PSM <- table.comparing.captivity.levels.3PSM %>%
mutate_at(vars(mean.x, lower.x, upper.x, mean.y, lower.y, upper.y), funs(round(.,
2)))
table.comparing.captivity.levels.3PSM <- data.frame(captivity.level = table.comparing.captivity.levels.3PSM[,
1], adjusted.mean = table.comparing.captivity.levels.3PSM[, 2], adjusted.CI = paste0("[",
table.comparing.captivity.levels.3PSM[, 3], ",", table.comparing.captivity.levels.3PSM[,
4], "]"), unadjusted.mean = table.comparing.captivity.levels.3PSM[, 5], unadjusted.CI = paste0("[",
table.comparing.captivity.levels.3PSM[, 6], ",", table.comparing.captivity.levels.3PSM[,
7], "]"))
table.comparing.captivity.levels.3PSM[, 1] <- c("FC", "FW", "WCT", "WCT & FC", "WCT & FW")
# creating a fancy table using the R package 'gt'
table.model.r.captivity.gt.3PSM <- table.comparing.captivity.levels.3PSM %>% gt() %>%
cols_label(captivity.level = md("**Captivity status**"), adjusted.mean = md("**Adjusted mean**"),
adjusted.CI = md("**Adjusted 95% CI**"), unadjusted.mean = md("**Unadjusted mean**"),
unadjusted.CI = md("**Unadjusted 95% CI**")) %>% cols_align(align = "center")
table.model.r.captivity.gt.3PSM
| Captivity status | Adjusted mean | Adjusted 95% CI | Unadjusted mean | Unadjusted 95% CI |
|---|---|---|---|---|
| FC | 0.29 | [0.17,0.41] | 0.28 | [0.15,0.41] |
| FW | 0.26 | [0.06,0.47] | 0.27 | [0.05,0.49] |
| WCT | 0.36 | [0.24,0.48] | 0.30 | [0.16,0.45] |
| WCT & FC | 0.41 | [0.1,0.72] | 0.43 | [0.09,0.78] |
| WCT & FW | 0.12 | [-0.14,0.38] | 0.16 | [-0.12,0.43] |
Table S5.1. Adjusted and unadjusted for publication bias results for all levels of the moderator ‘captivity status’ using a step function selection model based on one cut-point (α = 0.05). Estimates are presented as standardized effect sizes using Fisher’s transformation (i.e. Zr).
Last, using the aggregated data (see above), we will also perform a cumulative meta-analysis using the fuction ‘cumul()’ from the R package metafor (Viechtbauer 2010). By displaying the results of this cumulative meta-analysis as a forest plot, we can visually observe if there is evidence of decline effects (more in the text), which we do not seem to have for this data (Figure S5.3).
Forest plot showing the results of a cumulative meta-analysis and showing no clear evidence of decline effects.
cum.meta.analysis.model.r.agg <- cumul(meta.analysis.model.r.agg, order = order(dataset.r$year))
forest(cum.meta.analysis.model.r.agg, xlab = "Overall estimate (Zr)", digits = c(2,
1), cex = 0.3, header = "Author(s) and year")
To sample one effect size per study we will make use of for loops. These for loops will (i) select one effect size per study at a time to generate a meta-analytic dataset, (ii) fit the publication bias test of interest, and (iii) extract estimates from the publication bias test output. This process will be repeated 1000 times (i.e. 1000 samplings). Notice that we are going to create a for loop for each of the sections below (S5.1.2.1-S5.1.2.3), but authors may want to use a single for loop to make the process more efficient (and comparable).
For an introduction to the trim-and-fill method see section S5.1.1.1 and the main text. Below are the summary results of the trim-and-fill method after running 1000 randomizations.
# for loop to extract one single effect size per study and running trim-and-fill
n.randomization.r.TAF <- 1000 #number of samplings to be done
# # vector to collect the estimated number of missing studies for each database
# trimfill.n.vector.r <- c() # vector to collect the ajusted effect for each
# database trimfill.b.vector.r <- c() for(i in 1:n.randomization.r.TAF){ #
# creating a new dataframe to be filled randomized.db.r <-
# data.frame(study=factor(), yi=numeric(), vi=numeric(), stringsAsFactors=FALSE)
# for (j in levels(as.factor(dataset.r$study))){ #for each study do... #
# creating a dataset for each study tmp.r <-
# dataset.r[dataset.r$study==j,c('study','yi','vi')] # randomly extracting one
# effect size per study randomized.db.r <-
# rbind(randomized.db.r,tmp.r[sample(seq(1,nrow(tmp.r)),1),]) } # running the
# model on the dataframe created in the for loop above model.tmp.r <- rma(yi, vi,
# test='knha',data=randomized.db.r) # extracting the number of missing studies
# according to the trim-and-fill method for each database trimfill.n.vector.r <-
# c(trimfill.n.vector.r,trimfill(model.tmp.r)$k0) # extracting the adjusted
# overall effect according to the trim-and-fill method for each database
# trimfill.b.vector.r <- c(trimfill.b.vector.r,trimfill(model.tmp.r)$beta[[1]]) }
# # creating a dataframe with the n.randomization.r of estimated number of
# missing studies and adjusted overall effects trimfill.sampling.r <-
# data.frame(n=trimfill.n.vector.r, beta=trimfill.b.vector.r) # saving data to
# avoid re-running for loop as that can take a couple of minutes if
# n.randomization.r 1000 or more save(trimfill.sampling.r,file =
# here('data','trimfill_sampling_r_1000s.RData'))
# loading the data extracted from the for loop above to avoid re-running that for
# loop as that can take a couple of minutes if n.randomization.r 1000 or more
load(here("data", "trimfill_sampling_r_1000s.RData"))
The median number of missing studies was 10 (range = 0 - 16; Figure S5.4A), which is seemingly lower than the number of missing studies estimated using the aggregated data (n = 15; more in section S5.1.1.1). Additionally, the median adjusted overall effect according to the trim-and-fill method was 0.21 (range = 0.17 - 0.28; Figure S5.4B), which is very similar to the ajdusted overall effect estimated using the aggreated data (0.2; more in section S5.1.1.1), but still lower than the adjusted overall effect estimated by our suggested multilevel meta-regession method (0.26, 95% CI = [0.12,0.4]; more in section S4.1.4.3).
Distribution of 1000 (A) estimated number of missing studies according to the trim-and-fill method, and (B) adjusted overall effect sizes according to the trim-and-fill method. The dashed line shows the median value.
# plotting the distribution of the number of missing studies according to the
# trim-and-fill method
n.r <- ggplot(data = trimfill.sampling.r, aes(x = n)) + geom_density(alpha = 0.5,
fill = "darkorchid4") + geom_vline(data = trimfill.sampling.r, aes(xintercept = median(n)),
linetype = "dashed", size = 1)
# plotting the distribution of the adjusted overall effect according to the
# trim-and-fill method
betas.r <- ggplot(data = trimfill.sampling.r, aes(x = beta)) + geom_density(alpha = 0.5,
fill = "darkorchid4") + geom_vline(data = trimfill.sampling.r, aes(xintercept = median(beta)),
linetype = "dashed", size = 1)
ggarrange(n.r, betas.r, labels = c("A", "B"), ncol = 2, nrow = 1)
For an introduction to selection models see section S5.1.1.2 and the main text. As before, we will run a step function selection model based on one cut-point (α = 0.05); this model is sometimes referred to as a 3 parameter selection model. For simplicity, we will only run the selection model for the intercept-only model (but see section S5.1.1.2 for how to run a selection model for a meta-regression). Below are the summary results of the 3 parameter selection model after running 1000 randomizations.
# for loop to extract one single effect size per study and running selection
# model
n.randomization.r.3PSM <- 1000
# # vector to collect the ajusted effect for each database threePSM.vector.r <-
# c() for(i in 1:n.randomization.r.3PSM){ # creating a new dataframe to be filled
# randomized.db.r <- data.frame(study=factor(), yi=numeric(), vi=numeric(),
# year.c=numeric(), CaptivityC=character(), stringsAsFactors=FALSE) for (j in
# levels(as.factor(dataset.r$study))){ #for each study do... # creating a
# dataset for each study tmp.r <-
# dataset.r[dataset.r$study==j,c('study','yi','vi','year.c','CaptivityC')] #
# randomly extracting one effect size per study randomized.db.r <-
# rbind(randomized.db.r,tmp.r[sample(seq(1,nrow(tmp.r)),1),]) } # running the
# model on the dataframe created in the for loop above model.tmp.r.agg0<- rma(yi,
# vi, method='ML', test='knha',data=randomized.db.r) # extracting the adjusted
# overall effect according to the selection model for each database
# threePSM.vector.r <- c(threePSM.vector.r,selmodel(model.tmp.r.agg0,
# type='stepfun', steps=c(0.05))$beta[[1]]) } # creating a dataframe with the
# n.randomization.r of estimated number of missing studies and adjusted overall
# effects threePSM.sampling.r <- data.frame(beta=threePSM.vector.r) # saving data
# to avoid re-running for loop as that can take a couple of minutes if
# n.randomization.r 1000 or more save(threePSM.sampling.r,file =
# here('data','3PSM_sampling_r_1000s.RData'))
# loading the data extracted from the for loop above to avoid re-running that for
# loop as that can take a couple of minutes if n.randomization.r 1000 or more
load(here("data", "3PSM_sampling_r_1000s.RData"))
The median adjusted overall effect according to the 3 parameter selection model was 0.29 (range = 0.25 - 0.32; Figure S5.4B), which is slightly smaller than the adjusted overall effect estimated using the aggreated data (0.31; 95% CI = [0.21,0.41]; more in section S5.1.1.2), but still slightly larger than the adjusted overall effect estimated by our suggested mulitlevel meta-regession method (0.26, 95% CI = [0.12,0.4]; more in section S4.1.4.3).
Distribution of 1000 adjusted overall effect sizes according to the 3 parameter selection model. The dashed line shows the median value.
# plotting the distribution of adjusted overall effect sizes according to the 3
# parameter selection model
ggplot(data = threePSM.sampling.r, aes(x = beta)) + geom_density(alpha = 0.5, fill = "darkorchid4") +
geom_vline(data = threePSM.sampling.r, aes(xintercept = median(beta)), linetype = "dashed",
size = 1)
For an introduction to cumulative meta-analysis see section S5.1.1.3 and the main text. Here, we will perform a sampling procedure where we run a cumulative meta-analysis test for each dataset and extract the mean effects estimated by the cumulative meta-analysis. We will then plot the median and 95% quantiles of those mean effects at each step. As before, the forest plot does not show any clear evidence of decline effects in this dataset (Figure S5.6).
# for loop to extract one single effect size per study and running cumulative
# meta-analysis
n.randomization.r.CM <- 1000 #number of samplings to be done
# # vector to collect the effects estimated over time cumul.vector.r <- c() for(i
# in 1:n.randomization.r.CM){ # creating a new dataframe to be filled
# randomized.db.r <- data.frame(study=factor(), yi=numeric(), vi=numeric(),
# year=numeric(), stringsAsFactors=FALSE) for (j in
# levels(as.factor(dataset.r$study))){ #for each study do... # creating a
# dataset for each study tmp.r <-
# dataset.r[dataset.r$study==j,c('study','yi','vi','year')] # randomly extracting
# one effect size per study randomized.db.r <-
# rbind(randomized.db.r,tmp.r[sample(seq(1,nrow(tmp.r)),1),]) } # running the
# model on the dataframe created in the for loop above model.tmp.r <- rma(yi, vi,
# test='knha',data=randomized.db.r) # extracting the mean estimates estimated by
# the cumulative meta-analysis for each database cumul.vector.r <-
# c(cumul.vector.r,as.vector(cumul(model.tmp.r,order=order(randomized.db.r$year))$estimate))
# } # creating a dataframe with the order and the mean estimate from each
# database cumul.sampling.r <-
# data.frame(order=rep(seq(1,nrow(randomized.db.r)),n.randomization.r.CM),
# beta=cumul.vector.r) # saving data to avoid re-running for loop as that can
# take a few minutes if n.randomization.r 1000 or more save(cumul.sampling.r,file
# = here('data','cumul_sampling_r_1000s.RData'))
# loading the data extracted from the for loop above to avoid re-running that for
# loop as that can take a few minutes if n.randomization.r 1000 or more
load(here("data", "cumul_sampling_r_1000s.RData"))
Median and 95% quantiles of 1000 effect sizes estimated according to cumulative meta-analysis showing no evidence of decline effects.
cumul.sampling.r.summarized <- as.data.frame(cumul.sampling.r
%>% group_by(order) %>%
summarise(median = median(beta),
lowCI = quantile(beta,probs = c(0.025)),
hiCI = quantile(beta,probs = c(0.975))))
# reverses the factor level ordering for labels after coord_flip()
cumul.sampling.r.summarized$order <- factor(cumul.sampling.r.summarized$order, levels=rev(cumul.sampling.r.summarized$order))
ggplot(data=cumul.sampling.r.summarized, aes(x=order, y=median, ymin=lowCI, ymax=hiCI)) +
geom_pointrange() +
geom_hline(yintercept=0, lty=2) + # add a dotted line at x=1 after flip
coord_flip() + # flip coordinates (puts labels on y axis)
xlab("Order of publication") + ylab("Mean (95% CI)") +
theme_bw()
# rho = 0.5 is as in Noble et al (Mol Ecol)
dataset.rr <- escalc(yi = yi, vi = vi, data = dataset.rr)
dataset.rr.agg <- aggregate(dataset.rr, cluster = paperID, struct = "CS", rho = 0.5)
# dataset.rr.agg
dataset.smd <- escalc(yi = yi, vi = vi, data = dataset.smd)
dataset.smd.agg <- aggregate(dataset.smd, cluster = paperID, struct = "CS", rho = 0.5)
# dataset.smd.agg
Using the aggregated data (see above), we will then fit the trim-and-fill method (Duval and Tweedie 2000a,b). The trim-and-fill method is a non-parametric method that can be used to both detect and correct for publication bias and, although it can handle some heterogeneity, it performs worse as heterogeneity increases (Peters et al., 2007; Moreno et al., 2009; more in the main text). Since we aggregated our data, we can know run a random-effects meta-analytic model rather than a multilevel model as used in section S.4, and we will run the trim-and-fill method using this model.
# lnRR
meta.analysis.model.rr.agg <- rma(yi, vi, test="knha", # using t dist rather than z
data=dataset.rr.agg)
trimfill.rr <- trimfill(meta.analysis.model.rr.agg)
#SMD
meta.analysis.model.smd.agg <- rma(yi, vi, test="knha", # using t dist rather than z
data=dataset.smd.agg)
trimfill.smd <- trimfill(meta.analysis.model.smd.agg)
The trim-and-fill method identified 0 (SE = 8.88) missing studies (Figure S5.1) and estimated an adjusted overall effect of -0.15 (95% CI = [-0.2,-0.1]).
Funnel plot using precision (1/SE) as a measure of uncertainty and showing the adjusted meta-analytic mean (vertical line) and the missing effect sizes (white points) according to the trim-and-fill method.
par(mfrow = c(1, 2))
funnel(trimfill.rr, yaxis = "seinv", ylab = "Precision (1/SE)", xlab = "Effect size (lnRR)")
funnel(trimfill.smd, yaxis = "seinv", ylab = "Precision (1/SE)", xlab = "Effect size (SMD)")
Using the aggregated data (see above), we will also fit a selection model-based method (reviewed by Marks-Anglin & Chen, 2020). Although there are many selection models, all of them model how effect sizes are missing based on p values, effect sizes and/or sampling variance, and can provide an adjusted overall effect (e.g. Rodgers & Pustejovsky, 2020; more in the main text). Contrary to other methods, selection models can deal with and model heterogeneity (e.g. Citkowicz and Vevea 2017), but cannot deal with non-independent effect sizes. We will run a step function selection model based on one cut-point (α = 0.05); this model is sometimes referred to as a 3 parameter selection model (3PM; more in the main text; Rodgers & Pustejovsky, 2020; more by typing ?metafor::selmodel).
# lnRR
# without moderators
# moderator
meta.regression.rr.agg0 <- rma(yi, vi,
#mod = ~ scale(year),
method="ML",
test="knha",
data=dataset.rr)
three.PSM.rr0 <- selmodel(meta.regression.rr.agg0, type="stepfun", steps=c(0.05, 1))
#summary(three.PSM.rr0)
# with moderator
meta.regression.rr.agg <- rma(yi, vi,
mod = ~ year.c
+ Experimental.or.Observational. - 1,
#+ Class-1,
method="ML",
test="knha",
data=dataset.rr)
three.PSM.rr <- selmodel(meta.regression.rr.agg, type="stepfun", steps=c(0.05, 1))
## SMD
# without moderators
meta.regression.smd.agg0 <- rma(yi, vi,
method="ML",
test="knha",
data=dataset.smd)
three.PSM.smd0 <- selmodel(meta.regression.smd.agg0, type="stepfun", steps=c(0.05, 1))
# with moderators
meta.regression.smd.agg <- rma(yi, vi,
mod = ~ year.c
+ Experimental.or.Observational. - 1,
#+ Class-1,
method="ML",
test="knha",
data=dataset.smd)
three.PSM.smd <- selmodel(meta.regression.smd.agg, type="stepfun", steps=c(0.05, 1))
# extracting the mean and 95% confidence intervals
estimates.three.PSM.rr <- estimates.CI(three.PSM.rr)
estimates.three.PSM.smd <- estimates.CI(three.PSM.smd)
The step function selection model based on one cut-point (α = 0.05) for the intercept-only model estimated an adjusted overall effect of -0.1 (95% CI = [-0.18,-0.01]; Figure S5.2), which is larger than the original overall effect estimated by the meta-analytic model (0.29, 95% CI = [0.17,0.41]; more in section S4.2.3.3). Overall, the selection model estimated adjusted overall effects that were either very similar or even slightly larger than the original unadjusted effect sizes (Table S5.1), which might indicate that the specific selection model that we used (i.e. one cut-point (α = 0.05)) was not an adequate approximation for the underlying selection process.
Results of a step function selection model based on one cut-point (α = 0.05) for a model without any moderators (left-hand figure) and a model including both ‘year of publication’ and ‘Experimental.or.Observational.’ as moderators (right-hand figure).
par(mfrow = c(1, 2))
plot(three.PSM.rr0, col = "red", lty = 3)
plot(three.PSM.rr, col = "red", lty = 3)
par(mfrow = c(1, 2))
plot(three.PSM.smd0, col = "red", lty = 3)
plot(three.PSM.smd, col = "red", lty = 3)
# creating a table to show the results of the selection model
table.comparing.Exp.or.Obs.levels.3PSM <- merge(estimates.three.PSM.rr[c(2:3), ],
estimates.publication.bias.model.rr.v.3b, by = "estimate", all.x = T)
# rounding estimates
table.comparing.Exp.or.Obs.levels.3PSM <- table.comparing.Exp.or.Obs.levels.3PSM %>%
mutate_at(vars(mean.x, lower.x, upper.x, mean.y, lower.y, upper.y), funs(round(.,
2)))
table.comparing.Exp.or.Obs.levels.3PSM <- data.frame(Exp.or.Obs.level = table.comparing.Exp.or.Obs.levels.3PSM[,
1], adjusted.mean = table.comparing.Exp.or.Obs.levels.3PSM[, 2], adjusted.CI = paste0("[",
table.comparing.Exp.or.Obs.levels.3PSM[, 3], ",", table.comparing.Exp.or.Obs.levels.3PSM[,
4], "]"), unadjusted.mean = table.comparing.Exp.or.Obs.levels.3PSM[, 5],
unadjusted.CI = paste0("[", table.comparing.Exp.or.Obs.levels.3PSM[, 6], ",",
table.comparing.Exp.or.Obs.levels.3PSM[, 7], "]"))
table.comparing.Exp.or.Obs.levels.3PSM[, 1] <- c("Experimental", "Observational")
# creating a fancy table using the R package 'gt'
table.model.rr.Exp.or.Obs.gt.3PSM <- table.comparing.Exp.or.Obs.levels.3PSM %>% gt() %>%
cols_label(Exp.or.Obs.level = md("**Experimental.or.Observational.**"), adjusted.mean = md("**Adjusted mean**"),
adjusted.CI = md("**Adjusted 95% CI**"), unadjusted.mean = md("**Unadjusted mean**"),
unadjusted.CI = md("**Unadjusted 95% CI**")) %>% cols_align(align = "center")
table.model.rr.Exp.or.Obs.gt.3PSM
| Experimental.or.Observational. | Adjusted mean | Adjusted 95% CI | Unadjusted mean | Unadjusted 95% CI |
|---|---|---|---|---|
| Experimental | 0.06 | [-0.06,0.19] | -0.08 | [-0.15,0] |
| Observational | -0.19 | [-0.27,-0.11] | -0.25 | [-0.31,-0.18] |
Table S5.2. Adjusted and unadjusted for publication bias results for all levels of the moderator ‘Experimental.or.Observational.’ using a step function selection model based on one cut-point (α = 0.05). Estimates are presented as standardized effect sizes using lnRR.
# creating a table to show the results of the selection model
table.comparing.Exp.or.Obs.levels.3PSM <- merge(estimates.three.PSM.smd[c(2:3), ],
estimates.publication.bias.model.smd.v.3b, by = "estimate", all.x = T)
# rounding estimates
table.comparing.Exp.or.Obs.levels.3PSM <- table.comparing.Exp.or.Obs.levels.3PSM %>%
mutate_at(vars(mean.x, lower.x, upper.x, mean.y, lower.y, upper.y), funs(round(.,
2)))
table.comparing.Exp.or.Obs.levels.3PSM <- data.frame(Exp.or.Obs.level = table.comparing.Exp.or.Obs.levels.3PSM[,
1], adjusted.mean = table.comparing.Exp.or.Obs.levels.3PSM[, 2], adjusted.CI = paste0("[",
table.comparing.Exp.or.Obs.levels.3PSM[, 3], ",", table.comparing.Exp.or.Obs.levels.3PSM[,
4], "]"), unadjusted.mean = table.comparing.Exp.or.Obs.levels.3PSM[, 5],
unadjusted.CI = paste0("[", table.comparing.Exp.or.Obs.levels.3PSM[, 6], ",",
table.comparing.Exp.or.Obs.levels.3PSM[, 7], "]"))
table.comparing.Exp.or.Obs.levels.3PSM[, 1] <- c("Experimental", "Observational")
# creating a fancy table using the R package 'gt'
table.model.smd.Exp.or.Obs.gt.3PSM <- table.comparing.Exp.or.Obs.levels.3PSM %>%
gt() %>% cols_label(Exp.or.Obs.level = md("**Experimental.or.Observational.**"),
adjusted.mean = md("**Adjusted mean**"), adjusted.CI = md("**Adjusted 95% CI**"),
unadjusted.mean = md("**Unadjusted mean**"), unadjusted.CI = md("**Unadjusted 95% CI**")) %>%
cols_align(align = "center")
table.model.smd.Exp.or.Obs.gt.3PSM
| Experimental.or.Observational. | Adjusted mean | Adjusted 95% CI | Unadjusted mean | Unadjusted 95% CI |
|---|---|---|---|---|
| Experimental | -0.14 | [-0.46,0.17] | -0.37 | [-0.6,-0.13] |
| Observational | -0.49 | [-0.71,-0.26] | -0.57 | [-0.76,-0.39] |
Table S5.3. Adjusted and unadjusted for publication bias results for all levels of the moderator ‘Experimental.or.Observational.’ using a step function selection model based on one cut-point (α = 0.05). Estimates are presented as standardized effect sizes using SMD.
Last, using the aggregated data (see above), we will also perform a cumulative meta-analysis using the fuction ‘cumul()’ from the R package metafor (Viechtbauer 2010). By displaying the results of this cumulative meta-analysis as a forest plot, we can visually observe if there is evidence of decline effects (more in the text), which we do not seem to have for this data (Figure S5.9 and S5.10).
# lnRR
cum.meta.analysis.model.rr.agg <- cumul(meta.analysis.model.rr.agg)
# SMD
cum.meta.analysis.model.smd.agg <- cumul(meta.analysis.model.smd.agg)
par(mfrow = c(1, 2))
forest(cum.meta.analysis.model.rr.agg, xlab = "Overall estimate (lnRR)")
forest(cum.meta.analysis.model.smd.agg, xlab = "Overall estimate (SMD)")
To sample one effect size per study we will make use of for loops. These for loops will (i) select one effect size per study at a time to generate a meta-analytic dataset, (ii) fit the publication bias test of interest, and (iii) extract estimates from the publication bias test output. This process will be repeated 1000 times (i.e. 1000 samplings). Notice that we are going to create a for loop for each of the sections below (S5.2.2.1-S5.1.2.3), but authors may want to use a single for loop to make the process more efficient (and comparable).
For an introduction to the trim-and-fill method see section S5.2.1.1 and the main text. Below are the summary results of the trim-and-fill method after running 1000 randomizations.
## lnRR for loop to extract one single effect size per study and running
## trim-and-fill
n.randomization.rr.TAF <- 1000 #number of samplings to be done
# # vector to collect the estimated number of missing studies for each database
# trimfill.n.vector.rr <- c() # vector to collect the ajusted effect for each
# database trimfill.b.vector.rr <- c() names(dataset.rr)[3] <- 'study' # rename
# the column for study for(i in 1:n.randomization.rr.TAF){ # creating a new
# dataframe to be filled randomized.db.rr <- data.frame(study=factor(),
# yi=numeric(), vi=numeric(), stringsAsFactors=FALSE) for (j in
# levels(as.factor(dataset.rr$study))){ #for each study do... # creating a
# dataset for each study tmp.rr <-
# dataset.rr[dataset.rr$study==j,c('study','yi','vi')] # randomly extracting one
# effect size per study randomized.db.rr <-
# rbind(randomized.db.rr,tmp.rr[sample(seq(1,nrow(tmp.rr)),1),]) } # running the
# model on the dataframe created in the for loop above model.tmp.rr <- rma(yi,
# vi, test='knha',data=randomized.db.rr) # extracting the number of missing
# studies according to the trim-and-fill method for each database
# trimfill.n.vector.rr <- c(trimfill.n.vector.rr,trimfill(model.tmp.rr)$k0) #
# extracting the adjusted overall effect according to the trim-and-fill method
# for each database trimfill.b.vector.rr <-
# c(trimfill.b.vector.rr,trimfill(model.tmp.rr)$beta[[1]]) } # creating a
# dataframe with the n.rrandomization.rr of estimated number of missing studies
# and adjusted overall effects trimfill.sampling.rr <-
# data.frame(n=trimfill.n.vector.rr, beta=trimfill.b.vector.rr) # saving data to
# avoid re-running for loop as that can take a couple of minutes if
# n.rrandomization.rr 1000 or more save(trimfill.sampling.rr,file =
# here('data','trimfill_sampling_rr_1000s.RData'))
# loading the data extracted from the for loop above to avoid re-running that for
# loop as that can take a couple of minutes if n.randomization.r 1000 or more
load(here("data", "trimfill_sampling_rr_1000s.RData"))
## SMD for loop to extract one single effect size per study and running
## trim-and-fill
n.randomization.smd.TAF <- 1000 #number of samplings to be done
# # vector to collect the estimated number of missing studies for each database
# trimfill.n.vector.smd <- c() # vector to collect the ajusted effect for each
# database trimfill.b.vector.smd <- c() names(dataset.smd)[3] <- 'study' # rename
# the column for study for(i in 1:n.randomization.smd.TAF){ # creating a new
# dataframe to be filled randomized.db.smd <- data.frame(study=factor(),
# yi=numeric(), vi=numeric(), stringsAsFactors=FALSE) for (j in
# levels(as.factor(dataset.smd$study))){ #for each study do... # creating a
# dataset for each study tmp.smd <-
# dataset.smd[dataset.smd$study==j,c('study','yi','vi')] # randomly extracting
# one effect size per study randomized.db.smd <-
# rbind(randomized.db.smd,tmp.smd[sample(seq(1,nrow(tmp.smd)),1),]) } # running
# the model on the dataframe created in the for loop above model.tmp.smd <-
# rma(yi, vi, test='knha',data=randomized.db.smd) # extracting the number of
# missing studies according to the trim-and-fill method for each database
# trimfill.n.vector.smd <- c(trimfill.n.vector.smd,trimfill(model.tmp.smd)$k0) #
# extracting the adjusted overall effect according to the trim-and-fill method
# for each database trimfill.b.vector.smd <-
# c(trimfill.b.vector.smd,trimfill(model.tmp.smd)$beta[[1]]) } # creating a
# dataframe with the n.smdandomization.smd of estimated number of missing studies
# and adjusted overall effects trimfill.sampling.smd <-
# data.frame(n=trimfill.n.vector.smd, beta=trimfill.b.vector.smd) # saving data
# to avoid re-running for loop as that can take a couple of minutes if
# n.smdandomization.smd 1000 or more save(trimfill.sampling.smd,file =
# here('data','trimfill_sampling_smd_1000s.RData'))
# loading the data extracted from the for loop above to avoid re-running that for
# loop as that can take a couple of minutes if n.randomization.r 1000 or more
load(here("data", "trimfill_sampling_smd_1000s.RData"))
The median number of missing studies for lnRR was 0 (range = 0 - 17; Figure S5.9A). Additionally, the median adjusted overall lnRR according to the trim-and-fill method was -0.17 (range = -0.2 - -0.11; Figure S5.9B), which is larger than the adjusted overall lnRR estimated by our suggested multilevel meta-regession method (-0.09, 95% CI = [-0.21,0.03]; more in section S4.2.4.3).
Distribution of 1000 (A) estimated number of missing studies according to the trim-and-fill method, and (B) adjusted overall lnRR according to the trim-and-fill method. The dashed line shows the median value.
# plotting the distribution of the number of missing studies according to the
# trim-and-fill method
n.rr <- ggplot(data = trimfill.sampling.rr, aes(x = n)) + geom_density(alpha = 0.5,
fill = "darkorchid4") + geom_vline(data = trimfill.sampling.rr, aes(xintercept = median(n)),
linetype = "dashed", size = 1)
# plotting the distribution of the adjusted overall effect according to the
# trim-and-fill method
betas.rr <- ggplot(data = trimfill.sampling.rr, aes(x = beta)) + geom_density(alpha = 0.5,
fill = "darkorchid4") + geom_vline(data = trimfill.sampling.rr, aes(xintercept = median(beta)),
linetype = "dashed", size = 1)
ggarrange(n.rr, betas.rr, labels = c("A", "B"), ncol = 2, nrow = 1)
The median number of missing studies for SMD was 0 (range = 0 - 1; Figure S5.10A). Additionally, the median adjusted overall SMD according to the trim-and-fill method was -0.5 (range = -0.57 - -0.43; Figure S5.10B), which is larger than the adjusted overall SMD estimated by our suggested multilevel meta-regession method (-0.28, 95% CI = [-0.56,0]; more in section S4.2.4.3).
Distribution of 1000 (A) estimated number of missing studies according to the trim-and-fill method, and (B) adjusted overall SMD according to the trim-and-fill method. The dashed line shows the median value.
# plotting the distribution of the number of missing studies according to the
# trim-and-fill method
n.smd <- ggplot(data = trimfill.sampling.smd, aes(x = n)) + geom_density(alpha = 0.5,
fill = "darkorchid4") + geom_vline(data = trimfill.sampling.smd, aes(xintercept = median(n)),
linetype = "dashed", size = 1)
# plotting the distribution of the adjusted overall effect according to the
# trim-and-fill method
betas.smd <- ggplot(data = trimfill.sampling.smd, aes(x = beta)) + geom_density(alpha = 0.5,
fill = "darkorchid4") + geom_vline(data = trimfill.sampling.smd, aes(xintercept = median(beta)),
linetype = "dashed", size = 1)
ggarrange(n.smd, betas.smd, labels = c("A", "B"), ncol = 2, nrow = 1)
For an introduction to selection models see section S5.2.1.2 and the main text. As before, we ran a step function selection model based on one cut-point (α = 0.05); this model is sometimes referred to as a 3 parameter selection model. For simplicity, we only ran the selection model for the intercept-only model (but see section S5.2.1.2 for how to run a selection model for a meta-regression). Below are the summary results of the 3 parameter selection model after running 1000 randomizations.
# lnRR for loop to extract one single effect size per study and running selection
# model
n.randomization.rr.3PSM <- 1000
# # vector to collect the ajusted effect for each database threePSM.vector.rr <-
# c() for(i in 1:n.randomization.rr.3PSM){ # creating a new dataframe to be
# filled randomized.db.rr <- data.frame(study=factor(), yi=numeric(),
# vi=numeric(), year.c=numeric(), Experimental.or.Observational.=character(),
# stringsAsFactors=FALSE) for (j in levels(as.factor(dataset.rr$study))){ #for
# each study do... # creating a dataset for each study tmp.rr <-
# dataset.rr[dataset.rr$study==j,c('study','yi','vi','year.c','Experimental.or.Observational.')]
# # randomly extracting one effect size per study randomized.db.rr <-
# rbind(randomized.db.rr,tmp.rr[sample(seq(1,nrow(tmp.rr)),1),]) } # running the
# model on the dataframe created in the for loop above model.tmp.rr.agg0<-
# rma(yi, vi, method='ML', test='knha',data=randomized.db.rr) # extracting the
# adjusted overall effect according to the selection model for each database
# threePSM.vector.rr <- c(threePSM.vector.rr,selmodel(model.tmp.rr.agg0,
# type='stepfun', steps=c(0.05))$beta[[1]]) } # creating a dataframe with the
# n.randomization.r of estimated number of missing studies and adjusted overall
# effects threePSM.sampling.rr <- data.frame(beta=threePSM.vector.rr) # # saving
# data to avoid re-running for loop as that can take a couple of minutes if
# n.randomization.r 1000 or more save(threePSM.sampling.rr,file =
# here('data','3PSM_sampling_rr_1000s.RData')) loading the data extracted from
# the for loop above to avoid re-running that for loop as that can take a couple
# of minutes if n.randomization.r 1000 or more
load(here("data", "3PSM_sampling_rr_1000s.RData"))
The median adjusted overall lnRR according to the 3 parameter selection model was -0.07 (range = -0.11 - -0.01; Figure S5.4B), which is smaller than the adjusted overall lnRR estimated by our suggested mulitlevel meta-regession method (-0.09, 95% CI = [-0.21,0.03]; more in section S4.2.4.3).
Distribution of 1000 adjusted overall lnRR according to the 3 parameter selection model. The dashed line shows the median value.
# plotting the distribution of adjusted overall effect sizes according to the 3
# parameter selection model
ggplot(data = threePSM.sampling.rr, aes(x = beta)) + geom_density(alpha = 0.5, fill = "darkorchid4") +
geom_vline(data = threePSM.sampling.rr, aes(xintercept = median(beta)), linetype = "dashed",
size = 1)
# lnRR for loop to extract one single effect size per study and running selection
# model
n.randomization.smd.3PSM <- 1000
# # vector to collect the ajusted effect for each database threePSM.vector.smd <-
# c() for(i in 1:n.randomization.smd.3PSM){ # creating a new dataframe to be
# filled randomized.db.smd <- data.frame(study=factor(), yi=numeric(),
# vi=numeric(), year.c=numeric(), Experimental.or.Observational.=character(),
# stringsAsFactors=FALSE) for (j in levels(as.factor(dataset.smd$study))){ #for
# each study do... # creating a dataset for each study tmp.smd <-
# dataset.smd[dataset.smd$study==j,c('study','yi','vi','year.c','Experimental.or.Observational.')]
# # randomly extracting one effect size per study randomized.db.smd <-
# rbind(randomized.db.smd,tmp.smd[sample(seq(1,nrow(tmp.smd)),1),]) } # running
# the model on the dataframe created in the for loop above model.tmp.smd.agg0<-
# rma(yi, vi, method='ML', test='knha',data=randomized.db.smd) # extracting the
# adjusted overall effect according to the selection model for each database
# threePSM.vector.smd <- c(threePSM.vector.smd,selmodel(model.tmp.smd.agg0,
# type='stepfun', steps=c(0.05))$beta[[1]]) } # creating a dataframe with the
# n.randomization.r of estimated number of missing studies and adjusted overall
# effects threePSM.sampling.smd <- data.frame(beta=threePSM.vector.smd) # #
# saving data to avoid re-running for loop as that can take a couple of minutes
# if n.randomization.r 1000 or more save(threePSM.sampling.smd,file =
# here('data','3PSM_sampling_smd_1000s.RData')) loading the data extracted from
# the for loop above to avoid re-running that for loop as that can take a couple
# of minutes if n.randomization.r 1000 or more
load(here("data", "3PSM_sampling_smd_1000s.RData"))
The median adjusted overall SMD according to the 3 parameter selection model was -0.29 (range = -0.39 - -0.19; Figure S5.4B), which is smaller than the adjusted overall SMD estimated by our suggested mulitlevel meta-regession method (-0.28, 95% CI = [-0.56,0]; more in section S4.2.4.3).
Distribution of 1000 adjusted overall SMD according to the 3 parameter selection model. The dashed line shows the median value.
# plotting the distribution of adjusted overall effect sizes according to the 3
# parameter selection model
ggplot(data = threePSM.sampling.smd, aes(x = beta)) + geom_density(alpha = 0.5, fill = "darkorchid4") +
geom_vline(data = threePSM.sampling.smd, aes(xintercept = median(beta)), linetype = "dashed",
size = 1)
For an introduction to cumulative meta-analysis see section S5.2.1.3 and the main text. Here, we performed a sampling procedure where we ran a cumulative meta-analysis test for each dataset and extracted the mean effects estimated by the cumulative meta-analysis. We then ploted the median and 95% quantiles of those mean effects at each step. As before, the forest plot does not show any clear evidence of decline effects in this dataset (Figure S5.6).
# lnRR for loop to extract one single effect size per study and running
# cumulative meta-analysis
n.randomization.rr.CM <- 1000 #number of samplings to be done
# # vector to collect the effects estimated over time cumul.vector.rr <- c()
# for(i in 1:n.randomization.rr.CM){ # creating a new dataframe to be filled
# randomized.db.rr <- data.frame(study=factor(), yi=numeric(), vi=numeric(),
# year=numeric(), stringsAsFactors=FALSE) for (j in
# levels(as.factor(dataset.rr$study))){ #for each study do... # # creating a
# dataset for each study tmp.rr <-
# dataset.rr[dataset.rr$study==j,c('study','yi','vi','year')] # randomly
# extracting one effect size per study randomized.db.rr <-
# rbind(randomized.db.rr,tmp.rr[sample(seq(1,nrow(tmp.rr)),1),]) } # running the
# model on the dataframe created in the for loop above model.tmp.rr <- rma(yi,
# vi, test='knha',data=randomized.db.rr) # extracting the mean estimates
# estimated by the cumulative meta-analysis for each database cumul.vector.rr <-
# c(cumul.vector.rr,as.vector(cumul(model.tmp.rr,order=order(randomized.db.rr$year))$estimate))
# } # creating a dataframe with the order and the mean estimate from each
# database cumul.sampling.rr <-
# data.frame(order=rep(seq(1,nrow(randomized.db.rr)),n.randomization.rr.CM),
# beta=cumul.vector.rr) # saving data to avoid re-running for loop as that can
# take a few minutes if n.rrandomization.rr 1000 or more
# save(cumul.sampling.rr,file = here('data','cumul_sampling_rr_1000s.RData'))
# loading the data extracted from the for loop above to avoid re-running that for
# loop as that can take a few minutes if n.rrandomization.rr 1000 or more
load(here("data", "cumul_sampling_rr_1000s.RData"))
# SMD for loop to extract one single effect size per study and running cumulative
# meta-analysis
n.randomization.smd.CM <- 1000 #number of samplings to be done
# vector to collect the effects estimated over time cumul.vector.smd <- c() for(i
# in 1:n.randomization.smd.CM){ # creating a new dataframe to be filled
# randomized.db.smd <- data.frame(study=factor(), yi=numeric(), vi=numeric(),
# year=numeric(), stringsAsFactors=FALSE) for (j in
# levels(as.factor(dataset.smd$study))){ #for each study do... # # creating a
# dataset for each study tmp.smd <-
# dataset.smd[dataset.smd$study==j,c('study','yi','vi','year')] # randomly
# extracting one effect size per study randomized.db.smd <-
# rbind(randomized.db.smd,tmp.smd[sample(seq(1,nrow(tmp.smd)),1),]) } # running
# the model on the dataframe created in the for loop above model.tmp.smd <-
# rma(yi, vi, test='knha',data=randomized.db.smd) # extracting the mean estimates
# estimated by the cumulative meta-analysis for each database cumul.vector.smd <-
# c(cumul.vector.smd,as.vector(cumul(model.tmp.smd,order=order(randomized.db.smd$year))$estimate))
# } # creating a dataframe with the order and the mean estimate from each
# database cumul.sampling.smd <-
# data.frame(order=rep(seq(1,nrow(randomized.db.smd)),n.randomization.smd.CM),
# beta=cumul.vector.smd) # saving data to avoid re-running for loop as that can
# take a few minutes if n.smdandomization.smd 1000 or more
# save(cumul.sampling.smd,file = here('data','cumul_sampling_smd_1000s.RData'))
# loading the data extracted from the for loop above to avoid re-running that for
# loop as that can take a few minutes if n.smdandomization.smd 1000 or more
load(here("data", "cumul_sampling_smd_1000s.RData"))
Median and 95% quantiles of 1000 lnRR estimated according to cumulative meta-analysis showing evidence of decline effects.
cumul.sampling.rr.summarized <- as.data.frame(cumul.sampling.rr
%>% group_by(order) %>%
summarise(median = median(beta),
lowCI = quantile(beta,probs = c(0.025)),
hiCI = quantile(beta,probs = c(0.975))))
# reverses the factor level ordering for labels after coord_flip()
cumul.sampling.rr.summarized$order <- factor(cumul.sampling.rr.summarized$order, levels=rev(cumul.sampling.rr.summarized$order))
ggplot(data=cumul.sampling.rr.summarized, aes(x=order, y=median, ymin=lowCI, ymax=hiCI)) +
geom_pointrange() +
geom_hline(yintercept=0, lty=2) + # add a dotted line at x=1 after flip
coord_flip() + # flip coordinates (puts labels on y axis)
xlab("Order of publication") + ylab("Overall lnRR (95% CI)") +
theme_bw()
Median and 95% quantiles of 1000 SMD estimated according to cumulative meta-analysis showing evidence of decline effects.
cumul.sampling.smd.summarized <- as.data.frame(cumul.sampling.smd
%>% group_by(order) %>%
summarise(median = median(beta),
lowCI = quantile(beta,probs = c(0.025)),
hiCI = quantile(beta,probs = c(0.975))))
# reverses the factor level ordering for labels after coord_flip()
cumul.sampling.smd.summarized$order <- factor(cumul.sampling.smd.summarized$order, levels=rev(cumul.sampling.smd.summarized$order))
ggplot(data=cumul.sampling.smd.summarized, aes(x=order, y=median, ymin=lowCI, ymax=hiCI)) +
geom_pointrange() +
geom_hline(yintercept=0, lty=2) + # add a dotted line at x=1 after flip
coord_flip() + # flip coordinates (puts labels on y axis)
xlab("Order of publication") + ylab("Overall SMD (95% CI)") +
theme_bw()
There is no code for Fig. 1 (created in PowerPoint)
Rose??
# creating data
set.seed(77777)
# setting parameters
n.effect <- 100
sigma2.s <- 0.05
beta1 <- 0.2
# using negative binomial we get good spread of sample size
n.sample <- rnbinom(n.effect, mu = 30, size = 0.7) + 4
# variance for Zr
vi <- 1/(n.sample - 3)
# moderator x 1
xi <- rnorm(n.effect)
# there is underling overall effect to r = 0.2 or Zr = 0.203
Zr <- atanh(0.2) + beta1*xi + rnorm(n.effect, 0, sqrt(sigma2.s)) + rnorm(n.effect, 0, sqrt(vi))
#qplot(Zr, 1/sqrt(vi))
dat <- data.frame(yi = Zr, vi = vi, sei = sqrt(vi), xi = xi, ni = n.sample, prec = 1 / sqrt(vi), wi = 1/vi, zi = Zr/sqrt(vi))
rows <- 1:nrow(dat)
expected <- which(1/dat$sei < 5 & dat$yi < 0.25)
unexpected <- which(1/dat$sei > 4.7 & dat$yi > 0.25)
col_point <- ifelse(rows %in% expected, "red", ifelse(rows %in% unexpected, "blue", "black"))
dat$col_point <- col_point
#https://stackoverflow.com/questions/50012553/combining-plots-created-by-r-base-lattice-and-ggplot2
# saving base plot as an object!!! https://www.andrewheiss.com/blog/2016/12/08/save-base-graphics-as-pseudo-objects-in-r/
mod_ra <- rma(yi, vi, data = dat)
mod_fe <- rma(yi, vi, data = dat, method = "FE")
# margin controlling
#par(mar=c(5,4,0,0) + 0.1)
# - done TODO - DL for sunset plot and enhanced counter - making it - 0.01 < p < 0.05
# a
pdf(NULL)
dev.control(displaylist="enable")
par(mar=c(4,4,0.1,0) + 0.1)
funnel(dat$yi, dat$vi, ni = dat$ni, yaxis="ni",
xlim = c(-3, 3),
refline=mod_ra$beta, xlab = "Effect size (Zr)")
a <- recordPlot()
invisible(dev.off())
# b
pdf(NULL)
dev.control(displaylist="enable")
par(mar=c(4,4,0.1,0) + 0.1)
funnel(mod_ra, yaxis="seinv", col = col_point,
ylim = c(1, 12), xlim = c(-3, 3),
ylab = "Precison (1/SE)", xlab = "Effect size (Zr)")
legend(x = 1.4, y = 12, legend = c("expected", "","unexpected"), pch = c(19, 0, 19), col = c("red", "grey83", "blue"), bty = "n")
b <- recordPlot()
invisible(dev.off())
# c
pdf(NULL)
dev.control(displaylist="enable")
par(mar=c(4,4,0.1,0) + 0.1)
funnel(mod_ra,
yaxis = "sei",
xlim = c(-3, 3),
level = c(95, 99),
shade = c("white", "gray55"),
refline = 0, legend = F,
ylab = "Standard error (SE)", xlab = "Effect size (Zr)")
legend(x = 1, y = 0, legend = c("0.01 < p < 0.05"), pch = c(15), col = c("gray55"), bty = "n")
c <- recordPlot()
invisible(dev.off())
# d
d <- viz_sunset(dat[,c("yi", "sei")],
power_contours = "continuous",
contours = T,
power_stats = F,
xlab = "Effect size (Zr)",
ylab = "Standard error (SE)")
# e
# meta-regression (random-effects model)
mod_re1 <- rma(yi, vi, mod = ~ xi, data = dat)
pdf(NULL)
dev.control(displaylist="enable")
par(mar=c(4,4,0.1,0) + 0.1)
funnel(mod_re1,
yaxis = "sei",
xlim = c(-3, 3), ylim = c(0, 1),
#level = c(95, 99),
#shade = c("white", "gray75"),
refline = 0, legend = F, ylab = "Standard error (SE)", xlab = "Standardized residuls (Zr)")
#legend(x = 1, y = 0, legend = c("0.01 < p < 0.05", ""), pch = c(15, 0), col = c("gray75", "white"), bg = "white")
e <- recordPlot()
invisible(dev.off())
# f
#
mod <- metagen(TE = yi, seTE = sqrt(vi) , data = dat)
pdf(NULL)
dev.control(displaylist="enable")
par(mar=c(4,4,0.1,0) + 0.1)
#radial.rma(mod_fe, xlim = c(0, 12.5), zlab = "Z score", xlab = "Precision (1/SE)")
#legend(1,1, "test", bty = "n")
radial(mod, level = 0.95, pch = 19, xlab = "Precision (1/SE)", ylab = "Z score")
#abline(a = 0, b = 0.1908, lwd = 1.5)
f <- recordPlot()
invisible(dev.off())
# fig 3
fig3 <- (ggdraw(a) + ggdraw(b)) /(ggdraw(c) + ggdraw(d))/ (plot_grid(e) + ggdraw(f)) + plot_annotation(tag_levels = 'a')
fig3
# creating data with publication bias
dat2 <- dat[dat$col_point != "red", ]
# meta-analysis
mod_ra2 <- rma(yi, vi, data = dat2)
# egger regression (the same as in S2)
egger1 <- lm(zi ~ prec, data = dat2)
egger2 <- lm(yi ~ sei, weight = wi, data = dat2)
# data for time-lag bias
set.seed(123)
position <- sample(1:nrow(dat2), size = 15)
dat3 <- dat2[sort(position), ]
sorting <- c(12, 13, 14, 3, 4, 9, 8, 5, 10, 11, 6, 7, 2, 15, 1)
dat3 <- dat3[sorting, ]
dat3$year <- c(2000, 2001, 2003, 2005, 2006, 2007, 2008, 2009, 2010, 2012, 2013, 2014, 2016, 2018, 2019)
# cumlatie meta-analysis
cum <- rma(yi, vi, data=dat3)
cum_ma <- cumul(cum)
#forest(cum_ma, xlab = "Overall estimate (Zr)")
# meta-regression
mod_re2 <- rma(yi, vi, mod = ~ year, data = dat3)
# funnel
taf <- trimfill(mod_ra2)
# Plotting from here
p1 <- ggplot(dat2, aes(prec, zi)) + geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_hline(yintercept = egger1$coefficients[[1]]) +
#geom_smooth(method = "lm") +
labs(x = "Precision (1/SE)",
y = "Z score")
p2 <- ggplot(dat2, aes(sei, yi)) + geom_point() +
# geom_smooth(method = "lm") +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_abline(intercept = egger2$coefficients[[1]], slope = egger2$coefficients[[2]]) +
labs(x = "Standard error (SE)",
y = "Effect size (Zr)")
# c and d
# probably 15 data points for cumulative meta-analyses
cum <- rma(yi, vi, data=dat3)
cum_ma <- cumul(cum)
# p3
pdf(NULL)
dev.control(displaylist="enable")
par(mar=c(4,4,0.1,0) + 0.1)
forest(cum_ma, xlab = "Overall estimate (Zr)")
p3 <- recordPlot()
invisible(dev.off())
# p4
p4 <- ggplot(dat3, aes(x = year, y = yi, size = prec)) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_abline(intercept = mod_re2$beta[[1]], slope = mod_re2$beta[[2]]) +
geom_point(shape = 21, fill = "grey90") +
labs(x = "Publication year", y = "Effect size (Zr)", size ="Precision (1/SE)") +
guides(fill = "none", colour = "none") +
theme(legend.position = c(0, 0.15), legend.justification = c(0, 1)) +
theme(legend.direction = "horizontal") +
# theme(legend.background = element_rect(fill = "white", colour = "black")) +
theme(legend.background = element_blank()) +
theme(axis.text.y = element_text(size = 10, colour = "black", hjust = 0.5, angle = 90))
# p5
pdf(NULL)
dev.control(displaylist="enable")
par(mar=c(4,4,0.1,0) + 0.1)
funnel(taf, yaxis="seinv", col = col_point, estimator="R0",
ylim = c(1, 12),
#xlim = c(-3, 3),
ylab = "Precison (1/SE)", xlab = "Effect size (Zr)")
p5 <- recordPlot()
invisible(dev.off())
# p6
pdf(NULL)
dev.control(displaylist="enable")
par(mar=c(4,4,0.1,0) + 0.1)
funnel(taf, yaxis="sei", col = col_point, estimator="R0",
#xlim = c(-3, 3),
ylab = "Standard error (SE)", xlab = "Effect size (Zr)")
p6 <- recordPlot()
invisible(dev.off())
# e and f
# maybe use the same datasets as Fig 3
# fig 4
fig4 <- (plot_grid(p1) + plot_grid(p2)) /(ggdraw(p3) + plot_grid(p4))/(plot_grid(p5) + plot_grid(p6)) + plot_annotation(tag_levels = 'a')
fig4
# dmetr
# devtools::install_github('MathiasHarrer/dmetar') library(dmetar)
p_dat <- data.frame(studlab = as.factor(1:nrow(dat2)), TE = dat2$yi, seTE = dat2$sei)
# pcurve(p_dat)
pdf(NULL)
dev.control(displaylist = "enable")
par(mar = c(4, 4, 0.1, 0) + 0.1)
invisible(pcurve(p_dat))
plot <- recordPlot()
invisible(dev.off())
# turuning the base plot into ggplot
plot <- ggdraw(plot)
# dat2
model <- rma(yi, vi = vi, data = dat2, method = "ML", mod = ~xi)
# only sel1 and sel3 are used sel0 <- selmodel(model, type = 'beta')
sel1 <- selmodel(model, type = "halfnorm", prec = "sei", scaleprec = FALSE)
# sel2 <- selmodel(model, type='negexp', prec='sei', scaleprec=FALSE)
sel3 <- selmodel(model, type = "logistic", prec = "sei", scaleprec = FALSE)
# sel4 <- selmodel(model, type='power', prec='sei', scaleprec=FALSE)
sel1b <- selmodel(model, type = "halfnorm")
# sel2b <- selmodel(model, type='negexp')
sel3b <- selmodel(model, type = "logistic")
# sel4b <- selmodel(model, type='power')
# step function
sel5 <- selmodel(model, type = "stepfun", steps = c(0.05, 0.1, 0.5, 1))
# sel6 <- selmodel(model, type='stepfun', steps=c(0.001, 0.01, 0.05, 0.10, 0.25,
# 0.50, 0.75, 1))
sel7 <- selmodel(model, type = "stepfun", steps = c(0.05, 1))
# plot(sel5, ylim = c(0,1)) plot(sel6, add=TRUE, col='red', scale = T) plot(sel7,
# add=TRUE, col='red') legend(x = 0.5, y = 1, legend = c('3 cutpoints', '','1
# cutpoint (3 PMS)'), lty =c(1, 0, 1), col = c('black', '', 'red'), bty = 'n')
# plotting from here
pA <- plot + theme(plot.margin = unit(c(-1, 0, -1, 0), "cm"))
# B
pdf(NULL)
dev.control(displaylist = "enable")
par(mar = c(4, 4, 0.1, 0) + 0.1)
plot(sel1)
plot(sel3, add = TRUE, col = "red")
plot(sel1b, add = TRUE, lty = "dotted")
plot(sel3b, add = TRUE, col = "red", lty = "dotted")
legend(x = 0.6, y = 1, legend = c("half-normal", "", "logistic"), lty = c(1, 0, 1),
col = c("black", "", "red"), bty = "n")
pB <- recordPlot()
invisible(dev.off())
# C
pdf(NULL)
dev.control(displaylist = "enable")
par(mar = c(4, 4, 0.1, 0) + 0.1)
plot(sel5, ylim = c(0, 1))
plot(sel7, add = TRUE, col = "red", lty = 3)
legend(x = 0.5, y = 1, legend = c("3 cutpoints", "", "1 cutpoint (3 PSM)"), lty = c(1,
0, 1), col = c("black", "", "red"), bty = "n")
pC <- recordPlot()
invisible(dev.off())
# Fig 5
fig5 <- pA/(ggdraw(pB) + ggdraw(pC)) + plot_annotation(tag_levels = "a")
fig5
# Zr data creation
set.seed(777)
# setting parameters
n.study <- 70
sigma2.s <- 0.05
sigma2.u <- 0.05
beta1 <- 0.1
beta2 <- -0.05
# using negative binomial we get good spread of sample size
n.sample <- rnbinom(n.study, mu = 30, size = 0.2) + 4
# the number of effect size per study
set.seed(777777)
n.es.per.s <- rpois(n.study, 3) + 1
# sum(n.es.per.s)
n.effect <- sum(n.es.per.s)
# variance for Zr
vi <- 1/(rep(n.sample, n.es.per.s) - 3)
# In the text, we have two moderators moderator x 1 - study level
x1j <- rnorm(n.study)
# moderator x 2 - effect isze level
x2i <- rnorm(n.effect)
# study position
place <- rep(1:n.study, n.es.per.s)
# there is underling overall effect to r = 0.3 or Zr = 0.3095196
Zr <- 0 + rnorm(n.study, 0, sqrt(sigma2.s))[place] + rnorm(n.effect, 0, sqrt(sigma2.u)) +
rnorm(n.effect, 0, sqrt(vi))
# qplot(Zr, 1/sqrt(vi))
datA <- data.frame(yi = Zr, vi = vi, sei = sqrt(vi), x1 = x1j[place], x2 = x2i, ni = n.sample[place],
prec = 1/sqrt(vi), wi = 1/vi, zi = Zr/sqrt(vi), studyID = place, effectID = 1:n.effect)
# cutting publication mbias like stuff
expected <- c(which(1/datA$sei < 5.5 & datA$yi < 0.2), which(1/datA$sei < 3 & datA$yi <
0.7))
datB <- datA[-expected, ]
# qplot(x = yi, y = 1/sqrt(vi), data = datB)
mod_ma <- rma.mv(yi, vi, mod = ~sqrt(vi), random = list(~1 | studyID, ~1 | effectID),
data = datB)
mod_mb <- rma.mv(yi, vi, mod = ~vi, random = list(~1 | studyID, ~1 | effectID), data = datB)
# summary(mod_ma) summary(mod_mb)
# creating curve lines
x = seq(0, 1, by = 0.001)
# x <- x[-1000]
y = mod_mb$beta[[1]] + sqrt(x) * mod_mb$beta[[2]]
vi_curve <- tibble(x = x, y = y)
# lnRR
set.seed(777)
# setting parameters
n.study <- 30
sigma2.s <- 0.01
sigma2.u <- 0.005
beta1 <- 0.1
beta2 <- -0.01
# using negative binomial we get good spread of sample size
n.sample <- rnbinom(n.study, mu = 40, size = 0.5) + 4
# the number of effect size per study
set.seed(7777)
n.es.per.s <- rpois(n.study, 3) + 1
# sum(n.es.per.s)
# sample size
n.effect <- sum(n.es.per.s)
# cv for each study
cv <- sample(seq(0.3, 0.35, by = 0.001), n.effect, replace = T)
rep_sample <- rep(n.sample, times = n.es.per.s)
# variance for lnRR
vi <- (2 * cv^2)/rep_sample
# In the text, we have two moderators moderator x 1 - study level
x1j <- rnorm(n.study)
# moderator x 2 - effect isze level
x2i <- rnorm(n.effect)
# study position
pos <- rep(1:n.study, n.es.per.s)
# there is underling overall effect to lnRR = 0.2
dat3 <- data.frame(vi = vi, sei = sqrt(vi), x1 = x1j[pos], x2 = x2i, ni = n.sample[pos],
prec = 1/sqrt(vi), wi = 1/vi, studyID = pos, effectID = 1:n.effect)
# M matrix (V = VCV)
sigma <- make_VCV_matrix(dat3, "vi", "studyID", "effectID", rho = 0)
lnRR <- 0.3 + beta1 * x1j[pos] + beta2 * x2i + rnorm(n.study, 0, sqrt(sigma2.s))[pos] +
rnorm(n.effect, 0, sqrt(sigma2.u)) + MASS::mvrnorm(1, rep(0, n.effect), sigma)
dat3$yi <- lnRR
mod_ml <- rma.mv(yi, vi, mod = ~x1 + x2, random = list(~1 | studyID, ~1 | effectID),
data = dat3)
# residuals - stadnarised summary(mod_ml) funnel(mod_ml, yaxis='seinv', col =
# inferno(25)[pos])
# residual marginal
residm <- dat3$yi - predict(mod_ml)[[1]]
vi_residm <- vi + predict(mod_ml)[[2]]^2 # adding prediction errors
# residual conditional 1 & 2 + its error
blups <- ranef(mod_ml)
study_effect <- blups$studyID[[1]][pos]
vi_study <- (blups$studyID[[2]][pos])^2
residc1 <- dat3$yi - (predict(mod_ml)[[1]] + study_effect)
vi_residc1 <- vi_residm + vi_study
effect_effect <- blups$effectID[[1]]
vi_effect <- blups$effectID[[1]]^2
residc2 <- dat3$yi - (predict(mod_ml)[[1]] + study_effect + effect_effect)
vi_residc2 <- vi_residm + vi_study + vi_effect
# plotting Panel A
mod1 <- rma(yi, vi, data = dat3)
pdf(NULL)
dev.control(displaylist = "enable")
par(mar = c(4, 4, 0.1, 0) + 0.1)
funnel(mod1, col = inferno(25)[pos], ylim = c(0, 0.26), xlim = c(-0.6, 0.8), ylab = "Standard error (SE)",
xlab = "Effect size (lnRR)")
pa <- recordPlot()
invisible(dev.off())
# Panel B
pdf(NULL)
dev.control(displaylist = "enable")
par(mar = c(4, 4, 0.1, 0) + 0.1)
funnel(residm, vi = vi_residm, yaxis = "sei", col = inferno(25)[pos], ylim = c(0,
0.26), xlim = c(-0.6, 0.8), ylab = "Standard error (SE)", xlab = "Residuals marginal (lnRR)")
pb <- recordPlot()
invisible(dev.off())
# Panel C
pdf(NULL)
dev.control(displaylist = "enable")
par(mar = c(4, 4, 0.1, 0) + 0.1)
funnel(residc1, vi = vi_residc1, yaxis = "sei", col = inferno(25)[pos], ylim = c(0,
0.26), xlim = c(-0.6, 0.8), ylab = "Standard error (SE)", xlab = "Residuals conditional 1 (lnRR)")
pc <- recordPlot()
invisible(dev.off())
# Panel D
pdf(NULL)
dev.control(displaylist = "enable")
par(mar = c(4, 4, 0.1, 0) + 0.1)
funnel(residc2, vi = vi_residc2, yaxis = "sei", col = inferno(25)[pos], ylim = c(0,
0.26), xlim = c(-0.6, 0.8), ylab = "Standard error (SE)", xlab = "Residuals conditional 2 (lnRR)")
pd <- recordPlot()
invisible(dev.off())
# Panel E
pe <- ggplot(datB, aes(sei, yi)) + geom_point() + geom_line(data = vi_curve, mapping = aes(x = x,
y = y, col = "red")) + # geom_point(aes(col = studyID)) + geom_smooth(method = 'lm') +
geom_hline(yintercept = 0, linetype = "dashed") + geom_abline(intercept = mod_ma$beta[[1]],
slope = mod_ma$beta[[2]]) + xlim(0, 1) + ylim(-1.5, 2.5) + labs(x = "Standard error (SE)",
y = "Effect size (Zr)") + theme(legend.position = "none")
# F
# Variance
pf <- ggplot(datB, aes(vi, yi)) + geom_point() + # geom_point(aes(col = studyID)) + geom_smooth(method = 'lm') +
geom_hline(yintercept = 0, linetype = "dashed") + geom_abline(intercept = mod_mb$beta[[1]],
slope = mod_mb$beta[[2]], col = "red") + xlim(0, 1) + ylim(-1.5, 2.5) + labs(x = "Sampling variance",
y = "Effect size (Zr)")
# fig6
fig6 <- (ggdraw(pa) + ggdraw(pb))/(ggdraw(pc) + ggdraw(pd))/(plot_grid(pe) + plot_grid(pf)) +
plot_annotation(tag_levels = "a")
fig6
There is no code for Fig. 7 (created in PowerPoint)
sessionInfo() %>% pander()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
locale: en_AU.UTF-8||en_AU.UTF-8||en_AU.UTF-8||C||en_AU.UTF-8||en_AU.UTF-8
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: ggpubr(v.0.4.0), metaAidR(v.0.0.0.9000), viridis(v.0.5.1), viridisLite(v.0.3.0), dmetar(v.0.0.9000), ggplotify(v.0.0.5), meta(v.4.16-2), metaviz(v.0.3.1), cowplot(v.1.1.1), orchaRd(v.0.0.0.9000), readxl(v.1.3.1), lme4(v.1.1-26), here(v.1.0.1), patchwork(v.1.1.1), kableExtra(v.1.3.1), ape(v.5.4-1), rotl(v.3.0.11), openxlsx(v.4.2.3), pander(v.0.6.3), gt(v.0.2.2), metafor(v.2.5-82), Matrix(v.1.3-2), forcats(v.0.5.1), stringr(v.1.4.0), dplyr(v.1.0.4), purrr(v.0.3.4), readr(v.1.4.0), tidyr(v.1.1.2), tibble(v.3.0.6), ggplot2(v.3.3.3) and tidyverse(v.1.3.0)
loaded via a namespace (and not attached): minqa(v.1.2.4), colorspace(v.2.0-0), ggsignif(v.0.6.0), rio(v.0.5.16), class(v.7.3-18), modeltools(v.0.2-23), ellipsis(v.0.3.1), mclust(v.5.4.7), rprojroot(v.2.0.2), fs(v.1.5.0), rstudioapi(v.0.13), farver(v.2.0.3), ggrepel(v.0.9.1), flexmix(v.2.3-17), lubridate(v.1.7.9.2), mathjaxr(v.1.4-0), xml2(v.1.3.2), codetools(v.0.2-18), splines(v.4.0.3), robustbase(v.0.93-7), knitr(v.1.31), jsonlite(v.1.7.2), nloptr(v.1.2.2.2), broom(v.0.7.4), kernlab(v.0.9-29), cluster(v.2.1.0), dbplyr(v.2.1.0), BiocManager(v.1.30.10), rentrez(v.1.2.3), compiler(v.4.0.3), httr(v.1.4.2), rvcheck(v.0.1.8), backports(v.1.2.1), assertthat(v.0.2.1), cli(v.2.3.0), formatR(v.1.7), htmltools(v.0.5.1.1), prettyunits(v.1.1.1), tools(v.4.0.3), gtable(v.0.3.0), glue(v.1.4.2), Rcpp(v.1.0.6), carData(v.3.0-4), cellranger(v.1.1.0), vctrs(v.0.3.6), nlme(v.3.1-151), fpc(v.2.2-9), xfun(v.0.20), rvest(v.0.3.6), CompQuadForm(v.1.4.3), lifecycle(v.0.2.0), rstatix(v.0.6.0), statmod(v.1.4.35), XML(v.3.99-0.5), DEoptimR(v.1.0-8), MASS(v.7.3-53), scales(v.1.1.1), hms(v.1.0.0), parallel(v.4.0.3), RColorBrewer(v.1.1-2), curl(v.4.3), yaml(v.2.2.1), gridExtra(v.2.3), MuMIn(v.1.43.17), stringi(v.1.5.3), highr(v.0.8), poibin(v.1.5), boot(v.1.3-26), zip(v.2.1.1), prabclus(v.2.3-2), rlang(v.0.4.10), pkgconfig(v.2.0.3), rncl(v.0.8.4), evaluate(v.0.14), lattice(v.0.20-41), labeling(v.0.4.2), tidyselect(v.1.1.0), magrittr(v.2.0.1), R6(v.2.5.0), generics(v.0.1.0), DBI(v.1.1.1), foreign(v.0.8-81), pillar(v.1.4.7), haven(v.2.3.1), withr(v.2.4.1), nnet(v.7.3-15), abind(v.1.4-5), car(v.3.0-10), modelr(v.0.1.8), crayon(v.1.4.1), rmarkdown(v.2.6), progress(v.1.2.2), grid(v.4.0.3), data.table(v.1.13.6), netmeta(v.1.3-0), diptest(v.0.75-7), reprex(v.1.0.0), digest(v.0.6.27), webshot(v.0.5.2), numDeriv(v.2016.8-1.1), gridGraphics(v.0.5-1), stats4(v.4.0.3), munsell(v.0.5.0) and magic(v.1.5-9)
To be added.